Tufts University
Department of Urban and Environmental Policy and Planning
UEP-239: Geospatial Programming with Python
Final Project
By: Justina Cheng
May 2021
The purpose of this suitability analysis is to find the best Zip Code Tabulation Area (ZCTA) within the Boston Area for a Tufts University student and a Boston University student to live together without a car (i.e. somewhere equally convenient for both). Though the self-serving nature of the study is specific to students of Tufts and BU, a similar workflow can be conducted as a general analysis of the Boston Area (by omitting the network analysis between Tufts and BU) or by specifying different locations of interest (e.g. HubSpot's Cambridge office and the Museum of Fine Arts). The workflow can also be applied to any city or region, provided data is available.
The study is structured as follows:
environment.yml file.References and data used an be found in the README file.
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium import Choropleth, Marker
from folium.plugins import HeatMap, MarkerCluster
import osmnx as ox
import networkx as nx
from shapely.geometry import LineString, Point, Polygon, box
To create a GeoDataFrame of the Boston Region ZCTAs, the following steps were used:
convert_n_clip was created to convert a GDF to the CRS of another GDF and clip to the other's extent.read_n_clip was created to read in a shapefile and use convert_n_clip to convert it to the CRS of another GDF and clip to the other's extent.read_n_clip with the extent of Boston ZCTAs. # Import outline of detailed Massachusetts coastline.
outline_25k = gpd.read_file("./data/outline25k/OUTLINE25K_POLY.shp")
outline_25k.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 918 entries, 0 to 917 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AREA_ACRES 918 non-null float64 1 OBJECTID 918 non-null int64 2 SHAPE_AREA 918 non-null float64 3 SHAPE_LEN 918 non-null float64 4 geometry 918 non-null geometry dtypes: float64(3), geometry(1), int64(1) memory usage: 36.0 KB
# View first five rows.
outline_25k.head()
| AREA_ACRES | OBJECTID | SHAPE_AREA | SHAPE_LEN | geometry | |
|---|---|---|---|---|---|
| 0 | 4.794042e+06 | 1 | 1.940080e+10 | 2.457234e+06 | POLYGON ((262720.970 931706.880, 262726.090 93... |
| 1 | 6.400000e-02 | 2 | 2.589837e+02 | 6.441277e+01 | POLYGON ((247826.580 955016.130, 247812.530 95... |
| 2 | 1.756000e-01 | 3 | 7.104337e+02 | 1.160627e+02 | POLYGON ((249219.860 954550.130, 249209.020 95... |
| 3 | 4.402800e+00 | 4 | 1.781757e+04 | 6.075335e+02 | POLYGON ((248952.170 954025.880, 248958.480 95... |
| 4 | 6.554800e+00 | 5 | 2.652649e+04 | 7.402520e+02 | POLYGON ((248624.580 953915.130, 248580.140 95... |
# View CRS and plot.
print(outline_25k.crs)
outline_25k.plot(figsize=(12,12))
plt.title('Massachusetts Detailed Coastline', fontsize=16)
plt.show()
epsg:26986
# Import Zip Code Tabulation Areas within Massachusetts.
ma_zcta = gpd.read_file("./data/tl_2010_25_zcta500/tl_2010_25_zcta500.shp")
ma_zcta.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 501 entries, 0 to 500 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 STATEFP00 501 non-null object 1 ZCTA5CE00 501 non-null object 2 GEOID00 501 non-null object 3 CLASSFP00 501 non-null object 4 MTFCC00 501 non-null object 5 FUNCSTAT00 501 non-null object 6 ALAND00 501 non-null int64 7 AWATER00 501 non-null int64 8 INTPTLAT00 501 non-null object 9 INTPTLON00 501 non-null object 10 PARTFLG00 501 non-null object 11 geometry 501 non-null geometry dtypes: geometry(1), int64(2), object(9) memory usage: 47.1+ KB
# View first five rows.
ma_zcta.head()
| STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 02664 | 2502664 | B5 | G6350 | S | 19132195 | 361261 | +41.6747273 | -070.1956478 | N | POLYGON ((-70.21397 41.67049, -70.21416 41.671... |
| 1 | 25 | 02534 | 2502534 | B5 | G6350 | S | 6982729 | 59105 | +41.6690041 | -070.6152799 | N | MULTIPOLYGON (((-70.62981 41.66786, -70.62963 ... |
| 2 | 25 | 02657 | 2502657 | B5 | G6350 | S | 25037286 | 466360 | +42.0622998 | -070.1993066 | N | MULTIPOLYGON (((-70.23665 42.06287, -70.23676 ... |
| 3 | 25 | 02652 | 2502652 | B5 | G6350 | S | 21757561 | 46932 | +42.0412079 | -070.0897345 | N | POLYGON ((-70.14695 42.06466, -70.14691 42.064... |
| 4 | 25 | 02649 | 2502649 | B5 | G6350 | S | 60695300 | 6578141 | +41.6214759 | -070.4909354 | N | MULTIPOLYGON (((-70.46531 41.59374, -70.46547 ... |
# View CRS and plot.
print(ma_zcta.crs)
ma_zcta.plot(figsize=(12,12))
plt.title('Massachusetts ZCTAs', fontsize=16)
plt.show()
epsg:4269
# Convert CRSs to Massachusetts Mainland EPSG 6491.
outline_25k = outline_25k.to_crs('epsg:6491')
ma_zcta = ma_zcta.to_crs('epsg:6491')
# Confirm CRSs match.
outline_25k.crs == ma_zcta.crs
True
# Clip ZCTA GDF to 25k MA outline.
ma_zcta_25k = gpd.clip(ma_zcta, outline_25k)
ma_zcta_25k.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 501 entries, 0 to 500 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 STATEFP00 501 non-null object 1 ZCTA5CE00 501 non-null object 2 GEOID00 501 non-null object 3 CLASSFP00 501 non-null object 4 MTFCC00 501 non-null object 5 FUNCSTAT00 501 non-null object 6 ALAND00 501 non-null int64 7 AWATER00 501 non-null int64 8 INTPTLAT00 501 non-null object 9 INTPTLON00 501 non-null object 10 PARTFLG00 501 non-null object 11 geometry 501 non-null geometry dtypes: geometry(1), int64(2), object(9) memory usage: 50.9+ KB
# Import boundaries from Boston Region Metropolitan Planning Organization.
mpo = gpd.read_file("./data/MPO_Boundaries/MPO_Boundaries.shp")
mpo.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 13 entries, 0 to 12 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 OBJECTID 13 non-null int64 1 MPO 13 non-null object 2 created_us 0 non-null object 3 created_da 13 non-null object 4 last_edite 2 non-null object 5 last_edi_1 13 non-null object 6 GlobalID 13 non-null object 7 ShapeSTAre 13 non-null float64 8 ShapeSTLen 13 non-null float64 9 geometry 13 non-null geometry dtypes: float64(2), geometry(1), int64(1), object(6) memory usage: 1.1+ KB
# View MPO dataset.
mpo
| OBJECTID | MPO | created_us | created_da | last_edite | last_edi_1 | GlobalID | ShapeSTAre | ShapeSTLen | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | Berkshire | None | 1970-01-01 | None | 1970-01-01 | {08FDA544-18B0-412A-B442-287E53E987F7} | 2.451015e+09 | 2.471530e+05 | POLYGON ((-8128884.676 5272654.345, -8128962.2... |
| 1 | 3 | Cape Cod | None | 1970-01-01 | None | 1970-01-01 | {B6CD90CF-2F7D-43F2-B251-FA7F8E00EF01} | 1.067067e+09 | 1.288227e+06 | MULTIPOLYGON (((-7813968.781 5173329.197, -781... |
| 2 | 4 | Central Massachusetts | None | 1970-01-01 | None | 1970-01-01 | {CC777E14-53C8-42AD-B421-71444DA0BB60} | 2.487546e+09 | 2.683265e+05 | POLYGON ((-7977225.352 5223837.273, -7973861.8... |
| 3 | 5 | Franklin | None | 1970-01-01 | None | 1970-01-01 | {4804E708-6B89-4A85-9383-BD91F7589981} | 1.876456e+09 | 2.527017e+05 | POLYGON ((-8046511.241 5269691.856, -8045276.8... |
| 4 | 6 | Montachusett | None | 1970-01-01 | None | 1970-01-01 | {F315DA63-C9CF-40EE-8AA7-5ABA2E1FD528} | 1.772355e+09 | 2.748684e+05 | POLYGON ((-7976246.504 5267152.001, -7976121.9... |
| 5 | 7 | Martha's Vineyard | None | 1970-01-01 | None | 1970-01-01 | {84077DC9-D5D1-471D-9A64-1E5748F80B92} | 2.757449e+08 | 3.701721e+05 | MULTIPOLYGON (((-7859473.886 5083806.270, -785... |
| 6 | 8 | Merrimack Valley | None | 1970-01-01 | None | 1970-01-01 | {C09CD5BA-4FE1-45DD-838E-19F2ECF618DB} | 7.188680e+08 | 3.847121e+05 | MULTIPOLYGON (((-7890949.271 5294156.354, -788... |
| 7 | 9 | Northern Middlesex | None | 1970-01-01 | None | 1970-01-01 | {5B2D231C-F8EB-4768-AB7F-F9D02B3EDBBD} | 5.073539e+08 | 1.410283e+05 | POLYGON ((-7922038.111 5250986.670, -7923186.5... |
| 8 | 10 | Nantucket | None | 1970-01-01 | None | 1970-01-01 | {633A0B7F-266B-4F07-AA30-EC320231ADAA} | 1.266795e+08 | 1.680333e+05 | MULTIPOLYGON (((-7797087.819 5069759.518, -779... |
| 9 | 11 | Pioneer Valley | None | 1970-01-01 | None | 1970-01-01 | {B01F3417-DC74-4561-AE49-935ACC6EF1FF} | 3.054352e+09 | 3.206858e+05 | POLYGON ((-8049986.018 5212603.033, -8049953.4... |
| 10 | 12 | Boston Region | None | 1970-01-01 | DINOCCOD | 2018-04-18 | {3801574E-3CF1-4344-BE7C-8B2FBA431DD8} | 3.524379e+09 | 1.665026e+06 | MULTIPOLYGON (((-7875339.226 5247387.185, -787... |
| 11 | 13 | Old Colony | None | 1970-01-01 | DINOCCOD | 2018-04-18 | {BEE05B06-8BC7-463C-9C7D-1353E32150C9} | 9.938172e+08 | 3.368737e+05 | MULTIPOLYGON (((-7909974.763 5184049.466, -790... |
| 12 | 14 | Southeastern Massachusetts | None | 1970-01-01 | None | 1970-01-01 | {AEEBDC7F-368A-402A-8D40-EC12E6040D90} | 2.087314e+09 | 9.014607e+05 | MULTIPOLYGON (((-7922461.407 5170146.823, -792... |
# Convert MPO CRS to EPSG 6491 and plot.
mpo = mpo.to_crs('epsg:6491')
mpo.plot(figsize=(12,12))
plt.title('Boston Region MPO Boundaries', fontsize=16)
plt.show()
# Extract only Boston Region from MPO.
boston_region = mpo.loc[mpo.MPO == 'Boston Region'].reset_index()
boston_region
| index | OBJECTID | MPO | created_us | created_da | last_edite | last_edi_1 | GlobalID | ShapeSTAre | ShapeSTLen | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10 | 12 | Boston Region | None | 1970-01-01 | DINOCCOD | 2018-04-18 | {3801574E-3CF1-4344-BE7C-8B2FBA431DD8} | 3.524379e+09 | 1.665026e+06 | MULTIPOLYGON (((261954.246 925125.468, 261961.... |
# Extract ZCTAs within the Boston Region using the centroid of the ZCTAs.
boston_zcta = ma_zcta_25k[ma_zcta_25k.centroid.within(boston_region.geometry[0])].reset_index()
boston_zcta.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 159 entries, 0 to 158 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 159 non-null int64 1 STATEFP00 159 non-null object 2 ZCTA5CE00 159 non-null object 3 GEOID00 159 non-null object 4 CLASSFP00 159 non-null object 5 MTFCC00 159 non-null object 6 FUNCSTAT00 159 non-null object 7 ALAND00 159 non-null int64 8 AWATER00 159 non-null int64 9 INTPTLAT00 159 non-null object 10 INTPTLON00 159 non-null object 11 PARTFLG00 159 non-null object 12 geometry 159 non-null geometry dtypes: geometry(1), int64(3), object(9) memory usage: 16.3+ KB
# View the Boston Region ZCTAs.
boston_zcta
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 31 | 25 | 02458 | 2502458 | B5 | G6350 | S | 4709183 | 57045 | +42.3541154 | -071.1891300 | N | MULTIPOLYGON (((227500.026 901300.219, 227482.... |
| 1 | 32 | 25 | 01867 | 2501867 | B5 | G6350 | S | 25857991 | 46300 | +42.5351829 | -071.1054234 | N | POLYGON ((234193.323 919471.462, 234167.381 91... |
| 2 | 33 | 25 | 01880 | 2501880 | B5 | G6350 | S | 18969895 | 1554312 | +42.5043600 | -071.0640609 | N | POLYGON ((233593.527 918646.107, 233786.679 91... |
| 3 | 34 | 25 | 01730 | 2501730 | B5 | G6350 | S | 34703585 | 424511 | +42.4993313 | -071.2819054 | N | POLYGON ((215991.167 913794.709, 215916.137 91... |
| 4 | 35 | 25 | 02141 | 2502141 | B5 | G6350 | S | 1572423 | 44304 | +42.3702998 | -071.0825603 | N | POLYGON ((234064.636 902597.603, 234080.228 90... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 154 | 489 | 25 | 02030 | 2502030 | B5 | G6350 | S | 39440450 | 832120 | +42.2341291 | -071.2834628 | N | POLYGON ((215268.989 884982.344, 215020.550 88... |
| 155 | 490 | 25 | 02052 | 2502052 | B5 | G6350 | S | 37335006 | 608718 | +42.1845991 | -071.3053065 | N | POLYGON ((214616.455 880362.751, 214622.559 88... |
| 156 | 493 | 25 | 02111 | 2502111 | B5 | G6350 | S | 754214 | 79209 | +42.3487843 | -071.0589880 | N | POLYGON ((236144.915 900687.565, 236153.840 90... |
| 157 | 494 | 25 | 02130 | 2502130 | B5 | G6350 | S | 11393804 | 369565 | +42.3072770 | -071.1140555 | N | POLYGON ((231406.529 893484.217, 231413.864 89... |
| 158 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... |
159 rows × 13 columns
# Plot the Boston Region ZCTAs.
boston_zcta.plot(figsize=(12,12))
plt.title('ZCTAs within Boston Region', fontsize=16)
plt.show()
convert_n_clip and read_n_clip¶These two functions will be used to read in, convert the CRS, and clip the extent of GDFs throughout the analysis.
convert_n_clip takes two GeoDataFrames (GDF): one to process (gdf) and one whose extent will be used to clip. The function converts the coordinate reference system (CRS) of the original GDF and clips it to the extent of the extent GDF.
def convert_n_clip(orig_gdf, extent_gdf):
"""
Takes two GeoDataFrames (GDF): one to process (orig_gdf) and one whose extent will be used to clip (extent_gdf).
Converts the coordinate reference system (CRS) of the orig_gdf to the CRS of extent_gdf.
Clips to the extent of orig_gdf to the extent of extent_gdf.
Returns clipped GDF.
Requires GeoPandas to run.
Inputs:
orig_gdf = GDF to process
extent_gdf = GDF whose extent to use
Example:
ma_schools = convert_n_clip(usa_schools, ma_boundary)
"""
orig_gdf = orig_gdf.to_crs(extent_gdf.crs)
clipped_gdf = gpd.clip(orig_gdf, extent_gdf)
return clipped_gdf
read_n_clip takes a filepath for a shapefile and a GDF whose extent will be used to clip the shapefile. The function reads in the shapefile and uses convert_n_clip to convert the coordinate reference system (CRS) of the original GDF and clip it to the extent of the extent GDF.
def read_n_clip(filepath, extent_gdf):
"""
Takes a filepath for a shapefile and a GeoDataFrame (GDF).
Reads in the file.
Uses convert_n_clip function to convert to the coordinate reference system (CRS)
of the GDF and clip to the extent of the GDF.
Returns clipped GDF.
Requires GeoPandas to run.
Inputs:
filepath = relative filepath for shapefile to read
extent_gdf = GDF whose extent to use
Example:
ma_water = read_n_clip('./data/usa/water.shp', ma_boundary)
"""
shapefile = gpd.read_file(filepath)
clipped_shapefile = convert_n_clip(shapefile, extent_gdf)
return clipped_shapefile
Though not used for the analysis, surface water in the Boston region is a helpful landmark and reference point for orienting one to the map or for figuring out why some data may appear strange (e.g. why there are no rental points in an area).
Surface water in this analysis is added to some maps for reference, where appropriate.
# read_n_clip Boston surface water.
boston_water = read_n_clip('./data/hydro25k/HYDRO25K_POLY.shp', boston_zcta)
print(boston_water.crs)
boston_water.info()
epsg:6491 <class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 12720 entries, 7279 to 71450 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MINOR_TOT 12720 non-null object 1 POLY_CODE 12720 non-null int64 2 PWSID 87 non-null object 3 PALIS_ID 12720 non-null int64 4 SOURCE 12720 non-null object 5 MINOR_NUM 12720 non-null float64 6 RESOLUTION 12720 non-null object 7 NAME 785 non-null object 8 JOIN_ID 12720 non-null int64 9 COASTAL 12720 non-null int64 10 SHAPE_AREA 12720 non-null float64 11 SHAPE_LEN 12720 non-null float64 12 geometry 12720 non-null geometry dtypes: float64(3), geometry(1), int64(4), object(5) memory usage: 1.4+ MB
# View first five rows.
boston_water.head()
| MINOR_TOT | POLY_CODE | PWSID | PALIS_ID | SOURCE | MINOR_NUM | RESOLUTION | NAME | JOIN_ID | COASTAL | SHAPE_AREA | SHAPE_LEN | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7279 | 111007 | 2 | None | 0 | USGS/MGIS | 111007.0 | HIGH | None | 0 | 0 | 5.105577e+03 | 271.131343 | POLYGON ((251803.109 932591.312, 251783.391 93... |
| 7381 | 111608 | 5 | None | 0 | USGS/MGIS | 111608.0 | HIGH | None | 0 | 0 | 1.093061e+06 | 17343.738547 | POLYGON ((260520.352 931645.959, 260520.470 93... |
| 7382 | 421619 | 6 | None | 0 | USGS/MGIS | 421619.0 | HIGH | None | 0 | 0 | 1.783163e+02 | 51.525859 | POLYGON ((261529.594 932830.875, 261534.062 93... |
| 7387 | 111007 | 2 | None | 0 | USGS/MGIS | 111007.0 | HIGH | None | 0 | 0 | 6.195342e+03 | 365.598343 | POLYGON ((250616.797 939017.812, 250573.125 93... |
| 7388 | 111007 | 2 | None | 0 | USGS/MGIS | 111007.0 | HIGH | None | 0 | 0 | 1.066482e+03 | 185.964487 | POLYGON ((250196.344 939024.375, 250225.578 93... |
# Plot the Boston Region ZCTAs with surface water.
ax = boston_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax)
plt.title('ZCTAs and Waterways within Boston Region', fontsize=16)
plt.show()
Mass/public transit in the Boston area is primarily governed by the Massachusetts Bay Transportation Authority (MBTA) and is available in the form of buses, rapid transit (i.e. the T), and the commuter rail (which is used for traveling to and from the greater metro area). MBTA bus, MBTA T, and MBTA Commuter Rail stops and routes data were all obtained from MassGIS.
Stops were read in and processed with read_n_clip. However, because routes often go over or under water, routes were read in with gpd.read_file and converted to the proper CRS.
# read_n_clip MBTA bus stops, check CRS, and view info.
bos_bus_stop = read_n_clip('./data/mbtabus/MBTABUSSTOPS_PT.shp', boston_zcta)
print(bos_bus_stop.crs)
bos_bus_stop.info()
epsg:6491 <class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 7710 entries, 0 to 7809 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 STOP_ID 7710 non-null int64 1 STOP_NAME 7710 non-null object 2 TOWN 7710 non-null object 3 TOWN_ID 7710 non-null int64 4 geometry 7710 non-null geometry dtypes: geometry(1), int64(2), object(2) memory usage: 361.4+ KB
bos_bus_stop.head()
| STOP_ID | STOP_NAME | TOWN | TOWN_ID | geometry | |
|---|---|---|---|---|---|
| 0 | 3077 | Gallivan Blvd @ opp Marsh St | BOSTON | 35 | POINT (237120.669 892643.408) |
| 1 | 841 | Lagrange St @ Virgil Rd | BOSTON | 35 | POINT (227915.195 892644.017) |
| 2 | 446 | Norfolk St @ Nelson St | BOSTON | 35 | POINT (234385.661 892644.944) |
| 3 | 847 | Lagrange St opp Virgil St | BOSTON | 35 | POINT (227912.601 892650.156) |
| 4 | 3079 | Adams St @ Minot St | BOSTON | 35 | POINT (236644.812 892651.990) |
# Read in MBTA bus transit routes, convert CRS, check CRS, and view info.
bos_bus_route = gpd.read_file('./data/mbtabus/MBTABUSROUTES_ARC.shp')
bos_bus_route = bos_bus_route.to_crs(boston_zcta.crs)
print(bos_bus_route.crs)
bos_bus_route.info()
epsg:6491 <class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 923 entries, 0 to 922 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 SHAPE_ID 923 non-null object 1 MBTA_ROUTE 923 non-null object 2 MBTA_VARIA 920 non-null object 3 MBTA_ROU_1 923 non-null object 4 CTPS_ROUTE 923 non-null int64 5 CTPS_ROU_1 923 non-null float64 6 DIRECTION 923 non-null int64 7 ROUTE_DESC 923 non-null object 8 TRIP_HEADS 923 non-null object 9 CTPS_ROU_2 923 non-null int64 10 SHAPE_LEN 923 non-null float64 11 geometry 923 non-null geometry dtypes: float64(2), geometry(1), int64(3), object(6) memory usage: 86.7+ KB
bos_bus_route.head()
| SHAPE_ID | MBTA_ROUTE | MBTA_VARIA | MBTA_ROU_1 | CTPS_ROUTE | CTPS_ROU_1 | DIRECTION | ROUTE_DESC | TRIP_HEADS | CTPS_ROU_2 | SHAPE_LEN | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 160025 | 16 | 2 | 16_2 | 16 | 16.0 | 1 | Route 16_2 inbound | UMass | 16201 | 11447.693160 | LINESTRING (231866.335 894511.836, 231854.966 ... |
| 1 | 160003 | 16 | 3 | 16_3 | 16 | 16.0 | 1 | Route 16_3 inbound | Andrew | 16301 | 7182.299738 | LINESTRING (231866.335 894511.836, 231854.966 ... |
| 2 | 050009 | 05 | _ | 05_ | 5 | 5.0 | 0 | Route 05_ outbound | City Point | 5000 | 6566.593988 | LINESTRING (236875.676 897653.501, 236875.838 ... |
| 3 | 070011 | 07 | 2 | 07_2 | 7 | 7.0 | 0 | Route 07_2 outbound | City Point via East First St | 7200 | 3152.476054 | LINESTRING (236371.249 900507.531, 236360.463 ... |
| 4 | 080015 | 08 | 9 | 08_9 | 8 | 8.0 | 0 | Route 08_9 outbound | Harbor Point via South Bay Center | 8900 | 15339.996640 | LINESTRING (233318.701 899904.902, 233323.933 ... |
Stops were read in and processed with read_n_clip. However, because routes often go over or under water, routes were read in with gpd.read_file and converted to the proper CRS.
# read_n_clip MBTA rapid transit (T) stops, check CRS, and view info.
bos_rt_node = read_n_clip('./data/mbta_rapid_transit/MBTA_NODE.shp', boston_zcta)
print(bos_rt_node.crs)
bos_rt_node.info()
epsg:6491 <class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 166 entries, 0 to 165 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 STATION 166 non-null object 1 LINE 166 non-null object 2 TERMINUS 166 non-null object 3 ROUTE 166 non-null object 4 geometry 166 non-null geometry dtypes: geometry(1), object(4) memory usage: 7.8+ KB
# View first five rows.
bos_rt_node.head()
| STATION | LINE | TERMINUS | ROUTE | geometry | |
|---|---|---|---|---|---|
| 0 | Ashmont | RED | Y | A - Ashmont C - Alewife | POINT (236007.538 892693.023) |
| 1 | Harvard | RED | N | A - Ashmont B - Braintree C - Alewife | POINT (231387.274 902684.016) |
| 2 | Kendall/MIT | RED | N | A - Ashmont B - Braintree C - Alewife | POINT (234087.917 901406.551) |
| 3 | Capen Street | RED | N | Mattapan Trolley | POINT (234055.438 890869.375) |
| 4 | Tufts Medical Center | ORANGE | N | Forest Hills to Oak Grove | POINT (235900.324 899934.313) |
# Read in MBTA rapid transit (T) routes, convert CRS, check CRS, and view info.
bos_rt_route = gpd.read_file('./data/mbta_rapid_transit/MBTA_ARC.shp')
bos_rt_route = bos_rt_route.to_crs(boston_zcta.crs)
print(bos_rt_route.crs)
bos_rt_route.info()
epsg:6491 <class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 138 entries, 0 to 137 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 LINE 138 non-null object 1 ROUTE 138 non-null object 2 GRADE 138 non-null int64 3 SHAPE_LEN 138 non-null float64 4 geometry 138 non-null geometry dtypes: float64(1), geometry(1), int64(1), object(2) memory usage: 5.5+ KB
# View first five rows.
bos_rt_route.head()
| LINE | ROUTE | GRADE | SHAPE_LEN | geometry | |
|---|---|---|---|---|---|
| 0 | SILVER | SL3 | 3 | 79.538315 | LINESTRING (238786.088 902727.200, 238786.654 ... |
| 1 | SILVER | SL3 | 1 | 439.557315 | LINESTRING (238823.606 904952.475, 238807.669 ... |
| 2 | SILVER | SL3 | 1 | 680.245154 | LINESTRING (237840.206 905243.356, 237838.544 ... |
| 3 | SILVER | SL3 | 1 | 590.780545 | LINESTRING (238411.712 905095.055, 238375.464 ... |
| 4 | SILVER | SL3 | 1 | 819.160963 | LINESTRING (239198.867 904254.943, 239205.640 ... |
Stops were read in and processed with read_n_clip. Routes were originally read in with gpd.read_file and converted to the proper CRS to preserve areas over or under water, but the extent greatly exceeded that of the Boston Region MPO. Therefore, routes were read in and processed with read_n_clip.
# read_n_clip Commuter Rail stops, check CRS, and view info.
bos_train_node = read_n_clip('./data/trains/TRAINS_NODE.shp', boston_zcta)
print(bos_train_node.crs)
bos_train_node.info()
epsg:6491 <class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 201 entries, 8 to 386 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 STATION 201 non-null object 1 C_RAILSTAT 192 non-null object 2 AMTRAK 29 non-null object 3 MAP_STA 109 non-null object 4 LINE_BRNCH 198 non-null object 5 STATE 201 non-null object 6 geometry 201 non-null geometry dtypes: geometry(1), object(6) memory usage: 12.6+ KB
# View first five rows.
bos_train_node.head()
| STATION | C_RAILSTAT | AMTRAK | MAP_STA | LINE_BRNCH | STATE | geometry | |
|---|---|---|---|---|---|---|---|
| 8 | CPW-WJ | None | None | Y | None | MA | POINT (227365.266 926812.875) |
| 10 | BEMIS | None | None | Y | None | MA | POINT (224749.875 902015.500) |
| 22 | WILMINGTON | Y | None | None | LOWELL LINE | MA | POINT (226639.047 922000.000) |
| 23 | WEDGEMERE | Y | None | None | LOWELL LINE | MA | POINT (229604.469 910551.562) |
| 25 | ANDERSON/WOBURN | Y | Y | None | LOWELL LINE | MA | POINT (229272.906 918416.875) |
# read_n_clip Commuter Rail routes, check CRS, and view info.
bos_train_route = read_n_clip('./data/trains/TRAINS_RTE_TRAIN.shp', boston_zcta)
print(bos_train_route.crs)
bos_train_route.info()
epsg:6491 <class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 13 entries, 5 to 18 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 COMM_LINE 13 non-null object 1 COMMRAIL 13 non-null object 2 SHAPE_LEN 13 non-null float64 3 geometry 13 non-null geometry dtypes: float64(1), geometry(1), object(2) memory usage: 520.0+ bytes
# View first five rows.
bos_train_route.head()
| COMM_LINE | COMMRAIL | SHAPE_LEN | geometry | |
|---|---|---|---|---|
| 5 | Foxboro | S | 13687.671713 | LINESTRING (222862.470 865507.550, 222835.953 ... |
| 7 | Fitchburg | Y | 79764.881838 | MULTILINESTRING ((236052.969 901842.601, 23605... |
| 8 | Worcester | Y | 71501.453958 | LINESTRING (195983.138 890657.274, 195990.469 ... |
| 9 | Franklin | Y | 63840.600094 | MULTILINESTRING ((204964.595 871050.334, 20499... |
| 10 | Greenbush | Y | 44690.142535 | MULTILINESTRING ((262245.430 881092.333, 26224... |
ax = boston_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
bos_bus_stop.plot(ax=ax, color='red', markersize=1, label='Bus Stop')
bos_bus_route.plot(ax=ax, color='red', linewidth=0.5, label='Bus Route')
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='T Stop')
bos_rt_route.plot(ax=ax, color='green', label='T Route')
bos_train_node.plot(ax=ax, color='purple', markersize=20, label='Commuter Rail Stop')
bos_train_route.plot(ax=ax, color='purple', label='Commuter Rail Route')
plt.title('Mass Transit in Boston Region MPO', fontsize=16)
plt.legend()
plt.show()
Judging by the vast extent of the commuter rail, and the density of bus and T stops, the outer ZCTAs within the Boston Region MPO are more untenable for frequent commutes to work or to campus. A mass transit density analysis that includes the outer ZCTAs will skew those within the range of the bus and the T, so they have been excluded from the study by limiting the extent to that of the T.
To limit the study area to the extent of the T for a more realistic comparison of ZCTAs, the following steps were used:
shapely.geometry.box.# Extract bounds of Boston Rapid Transit (T) nodes.
rt_bounds = bos_rt_node.geometry.total_bounds
rt_bounds
array([220391.71303283, 884240.96758161, 241840.82146882, 909753.44856537])
# Creating bounding box with shapely.geometry.box
# shapely.geometry.box(minx, miny, maxx, maxy, ccw=True)
rt_bound_box = box(rt_bounds[0], rt_bounds[1], rt_bounds[2], rt_bounds[3])
rt_bound_box
# Store the extent as a Shapely Polygon in a variable called graph_extent.
graph_extent = rt_bound_box.buffer(0.1, join_style=2)
graph_extent
# Extract Boston Area ZCTAs within the graph extent using the centroid of the ZCTAs.
rt_zcta_box = boston_zcta[boston_zcta.centroid.within(graph_extent)]
rt_zcta_box.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 69 entries, 0 to 158 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 69 non-null int64 1 STATEFP00 69 non-null object 2 ZCTA5CE00 69 non-null object 3 GEOID00 69 non-null object 4 CLASSFP00 69 non-null object 5 MTFCC00 69 non-null object 6 FUNCSTAT00 69 non-null object 7 ALAND00 69 non-null int64 8 AWATER00 69 non-null int64 9 INTPTLAT00 69 non-null object 10 INTPTLON00 69 non-null object 11 PARTFLG00 69 non-null object 12 geometry 69 non-null geometry dtypes: geometry(1), int64(3), object(9) memory usage: 7.5+ KB
# View first five rows of rt_zcta_box.
rt_zcta_box.head()
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 31 | 25 | 02458 | 2502458 | B5 | G6350 | S | 4709183 | 57045 | +42.3541154 | -071.1891300 | N | MULTIPOLYGON (((227500.026 901300.219, 227482.... |
| 4 | 35 | 25 | 02141 | 2502141 | B5 | G6350 | S | 1572423 | 44304 | +42.3702998 | -071.0825603 | N | POLYGON ((234064.636 902597.603, 234080.228 90... |
| 5 | 36 | 25 | 02143 | 2502143 | B5 | G6350 | S | 4007477 | 0 | +42.3815721 | -071.0969953 | N | MULTIPOLYGON (((234059.089 902586.579, 234073.... |
| 6 | 37 | 25 | 02472 | 2502472 | B5 | G6350 | S | 10410330 | 355268 | +42.3694508 | -071.1779249 | N | POLYGON ((223853.273 902504.747, 223982.362 90... |
| 7 | 39 | 25 | 02464 | 2502464 | B5 | G6350 | S | 1447706 | 29494 | +42.3129751 | -071.2188818 | N | POLYGON ((223497.366 896100.959, 223538.895 89... |
# Plot the Boston Area ZCTAs within graph extent with the T to confirm success.
ax = rt_zcta_box.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='T Stop')
bos_rt_route.plot(ax=ax, color='green', label='T Route')
plt.title('ZCTAs within Boston Area Rapid Transit Extent', fontsize=16)
plt.legend()
plt.show()
There are a number of ZCTAs that are within the defined rectangular bounds but are not close to a T stop. The extent was further limited to the convex hull of T stops with the following steps:
unary_union.convex_hull.# Create a convex hull from T stops.
convex_bounds = bos_rt_node.unary_union.convex_hull
convex_bounds
# Store the extent as a Shapely Polygon in a variable called convex_graph_extent.
convex_graph_extent = convex_bounds.buffer(0.1)
convex_graph_extent
# Extract Boston Area ZCTAs that intersect with the graph extent.
rt_zcta = boston_zcta[boston_zcta.intersects(convex_graph_extent)]
rt_zcta.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 62 entries, 0 to 158 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 62 non-null int64 1 STATEFP00 62 non-null object 2 ZCTA5CE00 62 non-null object 3 GEOID00 62 non-null object 4 CLASSFP00 62 non-null object 5 MTFCC00 62 non-null object 6 FUNCSTAT00 62 non-null object 7 ALAND00 62 non-null int64 8 AWATER00 62 non-null int64 9 INTPTLAT00 62 non-null object 10 INTPTLON00 62 non-null object 11 PARTFLG00 62 non-null object 12 geometry 62 non-null geometry dtypes: geometry(1), int64(3), object(9) memory usage: 6.8+ KB
# View first five rows of rt_zcta.
rt_zcta.head()
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 31 | 25 | 02458 | 2502458 | B5 | G6350 | S | 4709183 | 57045 | +42.3541154 | -071.1891300 | N | MULTIPOLYGON (((227500.026 901300.219, 227482.... |
| 4 | 35 | 25 | 02141 | 2502141 | B5 | G6350 | S | 1572423 | 44304 | +42.3702998 | -071.0825603 | N | POLYGON ((234064.636 902597.603, 234080.228 90... |
| 5 | 36 | 25 | 02143 | 2502143 | B5 | G6350 | S | 4007477 | 0 | +42.3815721 | -071.0969953 | N | MULTIPOLYGON (((234059.089 902586.579, 234073.... |
| 6 | 37 | 25 | 02472 | 2502472 | B5 | G6350 | S | 10410330 | 355268 | +42.3694508 | -071.1779249 | N | POLYGON ((223853.273 902504.747, 223982.362 90... |
| 7 | 39 | 25 | 02464 | 2502464 | B5 | G6350 | S | 1447706 | 29494 | +42.3129751 | -071.2188818 | N | POLYGON ((223497.366 896100.959, 223538.895 89... |
# Plot the Boston Area ZCTAs within convex graph extent to confirm success.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='T Stop')
bos_rt_route.plot(ax=ax, color='green', label='T Route')
plt.title('ZCTAs within Boston Area Rapid Transit Extent', fontsize=16)
plt.legend()
plt.show()
The following GDFs were clipped to the new rt_zcta extent:
boston_waterbos_bus_stopbos_bus_routebos_train_nodebos_train_routeBecause bos_rt_node was used to create the extent and bos_rt_route connects all T stops, bos_rt_route does not need to be clipped.
# Clip all relevant GDFs.
boston_water = gpd.clip(boston_water, rt_zcta)
bos_bus_stop = gpd.clip(bos_bus_stop, rt_zcta)
bos_bus_route = gpd.clip(bos_bus_route, rt_zcta)
bos_train_route = gpd.clip(bos_train_route, rt_zcta)
bos_train_node = gpd.clip(bos_train_node, rt_zcta)
# Plot new extent with mass transit.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
bos_bus_stop.plot(ax=ax, color='red', markersize=1, label='Bus Stop')
bos_bus_route.plot(ax=ax, color='red', linewidth=0.5, label='Bus Route')
bos_rt_node.plot(ax=ax, color='green', markersize=15, label='T Stop')
bos_rt_route.plot(ax=ax, color='green', label='T Route')
bos_train_node.plot(ax=ax, color='purple', markersize=20, label='Commuter Rail Stop')
bos_train_route.plot(ax=ax, color='purple', label='Commuter Rail Route')
plt.title('Mass Transit within Boston Area Rapid Transit Extent', fontsize=16)
plt.legend()
plt.show()
To find the locations of Tufts University and Boston University (BU), the Massachusetts Colleges and Universities shapefile was processed with read_n_clip to read the shapefile and clip it to the extent of boston_zcta. Tufts and BU were then extracted into a GeoDataGrame.
Though using NetworkX to conduct a network analysis on mass transit nodes and routes was untenable for this project, a network analysis was conducted on the driveable network within the extent to find shortest paths between Tufts and BU. These paths are intended to be used as visual reference points for areas that are convenient across indicators for both a Tufts student and a BU student.
# read_n_clip colleges within rt_zta.
colleges = read_n_clip('./data/colleges/COLLEGES_PT.shp', rt_zcta)
print(colleges.crs)
colleges.info()
epsg:6491 <class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 85 entries, 50 to 199 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 COLLEGE 85 non-null object 1 CAMPUS 34 non-null object 2 ADDRESS 85 non-null object 3 CITY 85 non-null object 4 ZIPCODE 85 non-null object 5 PLUS_FOUR 10 non-null object 6 GEOG_TOWN 85 non-null object 7 MAIN_TEL 85 non-null object 8 URL 85 non-null object 9 NCES_ID 82 non-null object 10 TYPE 85 non-null object 11 NCES_TYPE 85 non-null object 12 CATEGORY 85 non-null object 13 DEGREEOFFR 85 non-null object 14 AWARDSOFFR 85 non-null object 15 LARGEPROG 48 non-null object 16 CAMPUSSETT 85 non-null object 17 CAMPUSHOUS 85 non-null object 18 L_SRC 85 non-null object 19 geometry 85 non-null geometry dtypes: geometry(1), object(19) memory usage: 13.9+ KB
# Plot colleges with the basemap.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
colleges.plot(ax=ax, color='maroon')
plt.title('Colleges in Boston Area', fontsize=16)
plt.show()
# List all college names.
college_list = list(colleges.COLLEGE.unique())
college_list
['Toni & Guy Hairdressing Academy', 'University of Phoenix-Boston', 'Curry College', 'Mansfield Beauty Schools', 'Massachusetts School of Barbering', 'Newbury College', 'Boston Career Institute', 'Boston College', 'Harvard University', 'Boston University School of Medicine', 'Massachusetts College of Pharmacy and Health Science', 'Massachusetts College of Art and Design', 'Wentworth Institute of Technology', 'School of the Museum of Fine Arts', 'Boston Graduate School of Psychoanalysis Inc', 'Simmons College', 'Northeastern University Professional Advancement Network', 'Northeastern University', 'New England Conservatory of Music', 'Emmanuel College', 'Ali May Academy', 'Quincy College', 'Boston Baptist College', 'Laboure College', 'Eastern Nazarene College', 'William James College', 'Jupiter Beauty Academy', 'University of Massachusetts Boston', 'Hellenic College/Holy Cross', 'Pine Manor College', 'Andover Newton Theological School', 'Hebrew College', 'Gordon-Conwell Theological Seminary', 'Roxbury Community College', 'Springfield College School of Professional and Continuing Studies', 'Lincoln Technical Institute', 'Tufts University', "Flavia Leal Beauty Creator's Academy", 'Elizabeth Grady School of Esthetics and Massage Therapy', 'Lawrence Memorial Hospital/Regis College School of Nursing', 'Empire Beauty School', 'Salter School', 'New England Hair Academy', 'Wheelock College', 'Lasell College', "St. John's Seminary College", 'Benjamin Franklin Institute of Technology', 'The Boston Conservatory at Berklee', 'Berklee College of Music', 'Boston Architectural College', 'Boston University', 'American Academy of Personal Training', 'New England Law Boston', 'Bay State College', 'New England College of Optometry', 'Emerson College', 'Urban College of Boston', 'New England College of Business', 'Fisher College', 'Suffolk University', 'Massachusetts Institute of Technology', 'Massachusetts General Hospital Dietetic Internship', 'North Bennet Street School', 'Cambridge College', 'Cambridge College School of Management and Education', 'Hult International Business School', 'MGH Institute of Health Professions', 'Harvard College', 'Bunker Hill Community College', 'Longy School of Music of Bard College', 'Lesley University', 'East Boston Beauty Academy', 'Harvard-Smithsonian']
# Select only names matching Tufts University or Boston University.
colleges_select = colleges.loc[colleges.COLLEGE.isin(['Tufts University', 'Boston University'])]
colleges_select
| COLLEGE | CAMPUS | ADDRESS | CITY | ZIPCODE | PLUS_FOUR | GEOG_TOWN | MAIN_TEL | URL | NCES_ID | TYPE | NCES_TYPE | CATEGORY | DEGREEOFFR | AWARDSOFFR | LARGEPROG | CAMPUSSETT | CAMPUSHOUS | L_SRC | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 136 | Tufts University | Medford/Somerville Campus | 419 Boston Avenue | Medford | 02155 | None | MEDFORD | (617) 628-5000 | http://www.tufts.edu | 168148 | PRI | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Two but less than 4 years certificate;Bachelor... | None | Suburb: Large | Yes | nces.ed.gov | POINT (231416.402 906514.073) |
| 163 | Boston University | Charles River Campus | 1 Silber Way | Boston | 02215 | None | BOSTON | (617) 353-2000 | http://www.bu.edu | 164988 | PRI | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Less than one year certificate;One but less th... | None | City: Large | Yes | nces.ed.gov | POINT (232983.739 899977.282) |
| 164 | Tufts University | Boston Campus | 145 Harrison Avenue | Boston | 02111 | None | BOSTON | (617) 636-7000 | http://medicine.tufts.edu/ | 168148 | PRI | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Two but less than 4 years certificate;Bachelor... | None | Suburb: Large | Yes | nces.ed.gov | POINT (236057.132 900054.525) |
# Select only the Medford/Somerville Tufts Campus and the main BU Campus.
tufts_bu = colleges_select.iloc[[0, 1]]
tufts_bu
| COLLEGE | CAMPUS | ADDRESS | CITY | ZIPCODE | PLUS_FOUR | GEOG_TOWN | MAIN_TEL | URL | NCES_ID | TYPE | NCES_TYPE | CATEGORY | DEGREEOFFR | AWARDSOFFR | LARGEPROG | CAMPUSSETT | CAMPUSHOUS | L_SRC | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 136 | Tufts University | Medford/Somerville Campus | 419 Boston Avenue | Medford | 02155 | None | MEDFORD | (617) 628-5000 | http://www.tufts.edu | 168148 | PRI | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Two but less than 4 years certificate;Bachelor... | None | Suburb: Large | Yes | nces.ed.gov | POINT (231416.402 906514.073) |
| 163 | Boston University | Charles River Campus | 1 Silber Way | Boston | 02215 | None | BOSTON | (617) 353-2000 | http://www.bu.edu | 164988 | PRI | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Less than one year certificate;One but less th... | None | City: Large | Yes | nces.ed.gov | POINT (232983.739 899977.282) |
# Plot Tufts and BU on top of the base map.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
tufts_bu.plot(ax=ax, color='maroon', markersize=50)
plt.title('Tufts University and Boston Universtiy Over Boston Area', fontsize=16)
plt.show()
A network analysis was conducted to find the shortest path between Tufts and BU with the following steps:
unary_union with a small buffer to account for any nodes or edges over water.ox.graph_from_polygon.ox.get_nearest_node.nx.shortest_path.rt_zcta to use later in the study.graph_from_polygon¶# Convert the ZCTA extent to a polygon with buffer to include network over water.
zcta_bounds = rt_zcta.to_crs('epsg:4326').unary_union.buffer(0.003)
zcta_bounds
# Create and store the street network in the extent in a variable called graph:
graph = ox.graph_from_polygon(zcta_bounds, network_type='drive')
# Plot the street network within the graph extent.
fig, ax = ox.plot_graph(graph)
# Reproject the graph of the street network.
graph_proj = ox.project_graph(graph)
# View type of the projection.
type(graph_proj)
networkx.classes.multidigraph.MultiDiGraph
# Extract the nodes and edges of the reprojected graph to GeoDataFrames.
nodes_proj, edges_proj = ox.graph_to_gdfs(graph_proj, nodes=True, edges=True)
# View the first five rows of the projected nodes GeoDataFrame.
nodes_proj.head()
| y | x | street_count | lon | lat | highway | ref | geometry | |
|---|---|---|---|---|---|---|---|---|
| osmid | ||||||||
| 30730954 | 4.692573e+06 | 333521.887579 | 3 | -71.021817 | 42.367608 | NaN | NaN | POINT (333521.888 4692572.502) |
| 61441677 | 4.692605e+06 | 333528.489931 | 3 | -71.021746 | 42.367901 | NaN | NaN | POINT (333528.490 4692604.832) |
| 1102741801 | 4.692474e+06 | 333304.232697 | 3 | -71.024430 | 42.366676 | traffic_signals | NaN | POINT (333304.233 4692474.133) |
| 61151272 | 4.694337e+06 | 327625.851751 | 3 | -71.093904 | 42.382206 | NaN | NaN | POINT (327625.852 4694337.231) |
| 61151274 | 4.694292e+06 | 327673.152267 | 3 | -71.093317 | 42.381811 | NaN | NaN | POINT (327673.152 4694292.121) |
# View info for the nodes.
nodes_proj.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 32772 entries, 30730954 to 8620784566 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 y 32772 non-null float64 1 x 32772 non-null float64 2 street_count 32772 non-null int64 3 lon 32772 non-null float64 4 lat 32772 non-null float64 5 highway 2805 non-null object 6 ref 90 non-null object 7 geometry 32772 non-null geometry dtypes: float64(4), geometry(1), int64(1), object(2) memory usage: 2.3+ MB
# View the CRS for the projected edges GeoDataFrame.
edges_proj.crs
<Projected CRS: +proj=utm +zone=19 +ellps=WGS84 +datum=WGS84 +unit ...> Name: unknown Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - undefined Coordinate Operation: - name: UTM zone 19N - method: Transverse Mercator Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
# View the first five rows of the edges.
edges_proj.head()
| osmid | oneway | lanes | highway | maxspeed | length | geometry | name | width | ref | bridge | access | junction | tunnel | service | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| u | v | key | |||||||||||||||
| 30730954 | 61441677 | 0 | 197230699 | True | 2 | residential | 25 mph | 33.072 | LINESTRING (333521.888 4692572.502, 333524.915... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1102741801 | 0 | 197230701 | False | NaN | residential | 25 mph | 278.885 | LINESTRING (333521.888 4692572.502, 333519.804... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
| 61441677 | 330143703 | 0 | 721665439 | False | 2 | tertiary | NaN | 11.786 | LINESTRING (333528.490 4692604.832, 333516.798... | Airport Way | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 61441570 | 0 | 721665438 | False | 2 | tertiary | NaN | 199.190 | LINESTRING (333528.490 4692604.832, 333567.428... | Airport Way | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
| 1102741801 | 61429113 | 0 | [8644985, 591677313, 426657281] | False | [2, 4] | secondary | NaN | 222.713 | LINESTRING (333304.233 4692474.133, 333305.544... | Hotel Drive | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
# Convert Tufts and BU GeoDataFrame into the projected CRS.
tufts_bu_proj = tufts_bu.to_crs(edges_proj.crs)
tufts_bu_proj
| COLLEGE | CAMPUS | ADDRESS | CITY | ZIPCODE | PLUS_FOUR | GEOG_TOWN | MAIN_TEL | URL | NCES_ID | TYPE | NCES_TYPE | CATEGORY | DEGREEOFFR | AWARDSOFFR | LARGEPROG | CAMPUSSETT | CAMPUSHOUS | L_SRC | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 136 | Tufts University | Medford/Somerville Campus | 419 Boston Avenue | Medford | 02155 | None | MEDFORD | (617) 628-5000 | http://www.tufts.edu | 168148 | PRI | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Two but less than 4 years certificate;Bachelor... | None | Suburb: Large | Yes | nces.ed.gov | POINT (325686.650 4697307.813) |
| 163 | Boston University | Charles River Campus | 1 Silber Way | Boston | 02215 | None | BOSTON | (617) 353-2000 | http://www.bu.edu | 164988 | PRI | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Less than one year certificate;One but less th... | None | City: Large | Yes | nces.ed.gov | POINT (327061.153 4690727.764) |
# Plot the street network with Tufts and BU.
fig, ax = plt.subplots(figsize=(12, 12))
edges_proj.plot(ax=ax, color='grey', linewidth=0.5, alpha=0.7)
tufts_bu_proj.plot(ax=ax, color='maroon', label='Tufts and BU')
plt.title('Tufts University and Boston University on Boston Area Street Network', fontsize=16)
plt.legend()
plt.show()
# Find the nearest nodes for Tufts and BU and view.
for idx, row in tufts_bu_proj.iterrows():
geometry = (row.geometry.y, row.geometry.x)
school_node_id = ox.get_nearest_node(graph_proj, geometry, method = 'euclidean')
tufts_bu_proj.loc[idx, 'node_id'] = school_node_id
tufts_bu_proj = tufts_bu_proj.reset_index()
tufts_bu_proj
| index | COLLEGE | CAMPUS | ADDRESS | CITY | ZIPCODE | PLUS_FOUR | GEOG_TOWN | MAIN_TEL | URL | ... | NCES_TYPE | CATEGORY | DEGREEOFFR | AWARDSOFFR | LARGEPROG | CAMPUSSETT | CAMPUSHOUS | L_SRC | geometry | node_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 136 | Tufts University | Medford/Somerville Campus | 419 Boston Avenue | Medford | 02155 | None | MEDFORD | (617) 628-5000 | http://www.tufts.edu | ... | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Two but less than 4 years certificate;Bachelor... | None | Suburb: Large | Yes | nces.ed.gov | POINT (325686.650 4697307.813) | 66486437.0 |
| 1 | 163 | Boston University | Charles River Campus | 1 Silber Way | Boston | 02215 | None | BOSTON | (617) 353-2000 | http://www.bu.edu | ... | 4-year, Private not-for-profit | Research University | C, B, PBC, M, PMC, D | Less than one year certificate;One but less th... | None | City: Large | Yes | nces.ed.gov | POINT (327061.153 4690727.764) | 61450281.0 |
2 rows × 22 columns
# Extract node IDs for Tufts and BU.
tufts_node = tufts_bu_proj.loc[0, 'node_id']
bu_node = tufts_bu_proj.loc[1, 'node_id']
# Find shortest paths in both directions by length.
tufts_bu_path_length = nx.shortest_path(G=graph_proj, source=tufts_node, target=bu_node, weight='length')
bu_tufts_path_length = nx.shortest_path(G=graph_proj, source=bu_node, target=tufts_node, weight='length')
# Find shortest paths in both directions by time.
tufts_bu_path_time = nx.shortest_path(G=graph_proj, source=tufts_node, target=bu_node, weight='time')
bu_tufts_path_time = nx.shortest_path(G=graph_proj, source=bu_node, target=tufts_node, weight='time')
# Construct a list with all paths.
tufts_bu_paths_list = [tufts_bu_path_length, bu_tufts_path_length, tufts_bu_path_time, bu_tufts_path_time]
# Plot the paths on top of the projected graph network.
fig, ax = ox.plot_graph_routes(graph_proj, tufts_bu_paths_list)
# Create a Pandas Series containing the paths.
paths_ser = pd.Series(tufts_bu_paths_list)
paths_ser
0 [66486437.0, 66481898, 66451176, 66468266, 664... 1 [61450281.0, 61452326, 61367805, 61447017, 613... 2 [66486437.0, 66481898, 66463748, 66423089, 664... 3 [61450281.0, 61452326, 61367805, 61447017, 249... dtype: object
# Create a GeoPandas GeoDataFrame with paths and path info.
paths_gdf = gpd.GeoDataFrame({'origin':['Tufts', 'BU', 'Tufts', 'BU'],
'destination':['BU', 'Tufts', 'BU', 'Tufts'],
'path':paths_ser})
paths_gdf.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 4 entries, 0 to 3 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 origin 4 non-null object 1 destination 4 non-null object 2 path 4 non-null object dtypes: object(3) memory usage: 224.0+ bytes
# Create empty GeoDataFrame to add path LineStrings and lengths to.
paths_geom = gpd.GeoDataFrame()
# Iterate through each path in the paths DataFrame
for idx, row in paths_gdf.iterrows():
# Find the nodes belonging to the path.
path_nodes = nodes_proj.loc[paths_gdf.loc[idx, 'path']]
# Create a LineString out of the Point geometries of the nodes.
path_line = LineString(list(path_nodes.geometry.values))
# Create a GeoDataFrame for the path.
path_geom = gpd.GeoDataFrame([[path_line]], geometry='geometry', crs=edges_proj.crs, columns=['geometry'])
# Add a list of osmids associated with the path.
path_geom.loc[0, 'osmids'] = str(list(path_nodes.index.values))
# Calculate the path length.
path_geom['path_dist'] = path_geom.length
# Append the path to the paths_geom GeoDataFrame.
paths_geom = paths_geom.append(path_geom, ignore_index=True)
# View the paths_geom GeoDataFrame to confirm.
paths_geom
| geometry | osmids | path_dist | |
|---|---|---|---|
| 0 | LINESTRING (325811.455 4697322.197, 325743.264... | [66486437.0, 66481898.0, 66451176.0, 66468266.... | 9440.247976 |
| 1 | LINESTRING (326986.857 4690706.096, 326736.972... | [61450281.0, 61452326.0, 61367805.0, 61447017.... | 9390.336903 |
| 2 | LINESTRING (325811.455 4697322.197, 325743.264... | [66486437.0, 66481898.0, 66463748.0, 66423089.... | 12412.504889 |
| 3 | LINESTRING (326986.857 4690706.096, 326736.972... | [61450281.0, 61452326.0, 61367805.0, 61447017.... | 14689.466109 |
# Join paths_gdf and paths_geom on the index.
tufts_bu_paths_proj = paths_gdf.join(paths_geom)
tufts_bu_paths_proj
| origin | destination | path | geometry | osmids | path_dist | |
|---|---|---|---|---|---|---|
| 0 | Tufts | BU | [66486437.0, 66481898, 66451176, 66468266, 664... | LINESTRING (325811.455 4697322.197, 325743.264... | [66486437.0, 66481898.0, 66451176.0, 66468266.... | 9440.247976 |
| 1 | BU | Tufts | [61450281.0, 61452326, 61367805, 61447017, 613... | LINESTRING (326986.857 4690706.096, 326736.972... | [61450281.0, 61452326.0, 61367805.0, 61447017.... | 9390.336903 |
| 2 | Tufts | BU | [66486437.0, 66481898, 66463748, 66423089, 664... | LINESTRING (325811.455 4697322.197, 325743.264... | [66486437.0, 66481898.0, 66463748.0, 66423089.... | 12412.504889 |
| 3 | BU | Tufts | [61450281.0, 61452326, 61367805, 61447017, 249... | LINESTRING (326986.857 4690706.096, 326736.972... | [61450281.0, 61452326.0, 61367805.0, 61447017.... | 14689.466109 |
print('The shortest path by distance is',
round((tufts_bu_paths_proj.loc[1, 'path_dist'])/1000/1.61, 1),
'miles from BU to Tufts.')
print('The shortest path by time is',
round((tufts_bu_paths_proj.loc[2, 'path_dist'])/1000/1.61, 1),
'miles from Tufts to BU.')
The shortest path by distance is 5.8 miles from BU to Tufts. The shortest path by time is 7.7 miles from Tufts to BU.
# Plot the shortest paths on the street network.
fig, ax = plt.subplots(figsize=(12, 12))
edges_proj.plot(ax=ax, color='grey', linewidth=0.5, alpha=0.7)
tufts_bu_paths_proj.plot(ax=ax, color='red')
tufts_bu_proj.plot(ax=ax, color='maroon', zorder=10, label='Tufts and BU')
plt.title('Shortest Paths by Length and Time between Tufts and BU', fontsize=16)
plt.legend()
plt.show()
The paths GDF was converted to use as visual reference points for areas that are convenient across indicators for both a Tufts student and a BU student. Though they will not match bus routes, they can be used as a very rough proxy for areas that buses would be convenient. Locations near the shortest paths are preferable to ones farther from the paths.
tufts_bu_paths = tufts_bu_paths_proj.to_crs(rt_zcta.crs)
As the study's premise is living without a car, it is crucial that the home's location be easily accessible by mass transit. Density of transit stops was selected as an indicator for convenience of mass transit. While the routes are also important to consider, the stops are the on-off points to transit lines and necessary to accessing the transit systems.
The density of mass transit stops was calculated with the following steps:
count_records was created to count the number of records in a GDF within polygons of another GDF (e.g. number of bus stops within a ZCTA).count_records was used on the GDFs for bus stops, T stops, and train stops in the limited rt_zcta extent.multimerge was created to merge multiple DataFrames on the same column or list of columns. multimerge was used to add all mass transit stop counts to rt_zcta quantiles was created to find quantile values based on quantile thresholds for a set of values. Quantile values divide the dataset into equally sized segments. The results of this function are then used to reclassify data.reclass_5 was created to reclassify a value into one of five classes given thresholds for reclassifications (e.g. those found with quantiles.count_records¶count_records takes a GeoDataFrame and counts the number of records within another GDF of polygons. It outputs a DataFrame with the specified polygon column values and counts column. The optional argument op for gpd.sjoin defaults to 'within' unless otherwise specified.
def count_records(records_gdf, polygon_gdf, polygon_col, count_col, op='within'):
"""
Takes a GeoDataFrame and counts the number of records within another GDF of polygons.
Outputs a DataFrame with the specified polygon column values and counts.
Optional argument op defaults to 'within' unless otherwise specified.
Requires Pandas and GeoPandas to run.
Inputs:
records_gdf = GDF of records to count
polygon_gdf = GDF of polygons to count from
polygon_col = name of column in polygon_gdf
e.g. 'name'
count_col = name of column for counts in output
e.g. 'tree_count'
op = op argument for sjoin; defaults to 'within' unless otherwise specified
Example:
>>> tree_count = count_records(trees, towns, 'name', 'tree_count')
>>> tree_count
name tree_count
0 Plainsville 68
1 Springfield 40
2 Fairfield 81
3 Greenville 105
"""
# Conduct a spatial join of records and polygons.
spatial_join = gpd.sjoin(records_gdf, polygon_gdf, how='left', op=op)
# Count the number of records within each polygon.
records_count = spatial_join[polygon_col].value_counts().reset_index()
# Add columns to the new DF.
records_count.columns = [polygon_col, count_col]
return records_count
# count_records for bus stops within rt_zcta.
zcta_bus_count = count_records(bos_bus_stop, rt_zcta, 'ZCTA5CE00', 'bus_stop_count')
zcta_bus_count.describe()
| bus_stop_count | |
|---|---|
| count | 60.000000 |
| mean | 81.566667 |
| std | 67.974663 |
| min | 1.000000 |
| 25% | 28.750000 |
| 50% | 76.500000 |
| 75% | 108.000000 |
| max | 334.000000 |
# count_records for T stops within rt_zcta.
zcta_rt_count = count_records(bos_rt_node, rt_zcta, 'ZCTA5CE00', 'rt_stop_count')
zcta_rt_count.describe()
| rt_stop_count | |
|---|---|
| count | 44.000000 |
| mean | 3.772727 |
| std | 3.678117 |
| min | 1.000000 |
| 25% | 1.000000 |
| 50% | 2.000000 |
| 75% | 5.250000 |
| max | 15.000000 |
# count_records for Commuter Rail stops within rt_zcta.
zcta_train_count = count_records(bos_train_node, rt_zcta, 'ZCTA5CE00', 'train_stop_count')
zcta_train_count.describe()
| train_stop_count | |
|---|---|
| count | 27.000000 |
| mean | 2.962963 |
| std | 2.608882 |
| min | 1.000000 |
| 25% | 1.500000 |
| 50% | 2.000000 |
| 75% | 3.500000 |
| max | 10.000000 |
multimerge¶multimerge takes a base DataFrame and merges it with each DataFrame in a list of DataFrames on the specified column or list of columns and with the specified how. The function assumes the specified column(s) exist(s) across all DataFrames.
def multimerge(left_df, df_list, on_col, how):
"""
Takes a base DataFrame and merges with each DataFrame in a list of DataFrames
on the specified column or list of columns and with the specified 'how'.
Assumes on_col exists across all DFs.
Requires Pandas to run merge method.
Inputs:
left_df = base DF
df_list = list of DFs
e.g. [df1, df2, df3]
on_col = bracketed column name or list of columns (same across DFs)
e.g. ['name'], ['name', 'address', 'zip_code']
how = how argument
e.g. 'left'
Example:
>>> town_schools = multimerge(town, [elem, middle, high], ['town_name'], 'left')
"""
# Create a copy of the base DataFrame.
merge_df = left_df.copy()
# Merge each DataFrame within the list.
for i in range(len(df_list)):
merge_df = merge_df.merge(df_list[i], on=on_col, how=how)
return merge_df
# Merge rt_zcta with all transit stop counts.
count_list = [zcta_bus_count, zcta_rt_count, zcta_train_count]
zcta_nodes = multimerge(rt_zcta, count_list, ['ZCTA5CE00'], 'left').fillna(0)
zcta_nodes.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 62 entries, 0 to 61 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 62 non-null int64 1 STATEFP00 62 non-null object 2 ZCTA5CE00 62 non-null object 3 GEOID00 62 non-null object 4 CLASSFP00 62 non-null object 5 MTFCC00 62 non-null object 6 FUNCSTAT00 62 non-null object 7 ALAND00 62 non-null int64 8 AWATER00 62 non-null int64 9 INTPTLAT00 62 non-null object 10 INTPTLON00 62 non-null object 11 PARTFLG00 62 non-null object 12 geometry 62 non-null geometry 13 bus_stop_count 62 non-null float64 14 rt_stop_count 62 non-null float64 15 train_stop_count 62 non-null float64 dtypes: float64(3), geometry(1), int64(3), object(9) memory usage: 8.2+ KB
# View first five rows of new GDF.
zcta_nodes.head()
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | bus_stop_count | rt_stop_count | train_stop_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 31 | 25 | 02458 | 2502458 | B5 | G6350 | S | 4709183 | 57045 | +42.3541154 | -071.1891300 | N | MULTIPOLYGON (((227500.026 901300.219, 227482.... | 53.0 | 0.0 | 2.0 |
| 1 | 35 | 25 | 02141 | 2502141 | B5 | G6350 | S | 1572423 | 44304 | +42.3702998 | -071.0825603 | N | POLYGON ((234064.636 902597.603, 234080.228 90... | 23.0 | 1.0 | 0.0 |
| 2 | 36 | 25 | 02143 | 2502143 | B5 | G6350 | S | 4007477 | 0 | +42.3815721 | -071.0969953 | N | MULTIPOLYGON (((234059.089 902586.579, 234073.... | 108.0 | 0.0 | 0.0 |
| 3 | 37 | 25 | 02472 | 2502472 | B5 | G6350 | S | 10410330 | 355268 | +42.3694508 | -071.1779249 | N | POLYGON ((223853.273 902504.747, 223982.362 90... | 99.0 | 0.0 | 1.0 |
| 4 | 39 | 25 | 02464 | 2502464 | B5 | G6350 | S | 1447706 | 29494 | +42.3129751 | -071.2188818 | N | POLYGON ((223497.366 896100.959, 223538.895 89... | 22.0 | 0.0 | 0.0 |
# Calculate total transit stops in each ZCTA.
zcta_nodes['nodes_count'] = zcta_nodes.bus_stop_count + zcta_nodes.rt_stop_count + zcta_nodes.train_stop_count
zcta_nodes
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | bus_stop_count | rt_stop_count | train_stop_count | nodes_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 31 | 25 | 02458 | 2502458 | B5 | G6350 | S | 4709183 | 57045 | +42.3541154 | -071.1891300 | N | MULTIPOLYGON (((227500.026 901300.219, 227482.... | 53.0 | 0.0 | 2.0 | 55.0 |
| 1 | 35 | 25 | 02141 | 2502141 | B5 | G6350 | S | 1572423 | 44304 | +42.3702998 | -071.0825603 | N | POLYGON ((234064.636 902597.603, 234080.228 90... | 23.0 | 1.0 | 0.0 | 24.0 |
| 2 | 36 | 25 | 02143 | 2502143 | B5 | G6350 | S | 4007477 | 0 | +42.3815721 | -071.0969953 | N | MULTIPOLYGON (((234059.089 902586.579, 234073.... | 108.0 | 0.0 | 0.0 | 108.0 |
| 3 | 37 | 25 | 02472 | 2502472 | B5 | G6350 | S | 10410330 | 355268 | +42.3694508 | -071.1779249 | N | POLYGON ((223853.273 902504.747, 223982.362 90... | 99.0 | 0.0 | 1.0 | 100.0 |
| 4 | 39 | 25 | 02464 | 2502464 | B5 | G6350 | S | 1447706 | 29494 | +42.3129751 | -071.2188818 | N | POLYGON ((223497.366 896100.959, 223538.895 89... | 22.0 | 0.0 | 0.0 | 22.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 57 | 482 | 25 | 02139 | 2502139 | B5 | G6350 | S | 3784723 | 368360 | +42.3622881 | -071.1037905 | N | POLYGON ((232130.866 902283.249, 232144.254 90... | 81.0 | 1.0 | 0.0 | 82.0 |
| 58 | 484 | 25 | 02474 | 2502474 | B5 | G6350 | S | 7903556 | 518211 | +42.4209494 | -071.1563695 | N | POLYGON ((228762.296 906831.977, 228756.803 90... | 79.0 | 0.0 | 0.0 | 79.0 |
| 59 | 493 | 25 | 02111 | 2502111 | B5 | G6350 | S | 754214 | 79209 | +42.3487843 | -071.0589880 | N | POLYGON ((236144.915 900687.565, 236153.840 90... | 13.0 | 8.0 | 9.0 | 30.0 |
| 60 | 494 | 25 | 02130 | 2502130 | B5 | G6350 | S | 11393804 | 369565 | +42.3072770 | -071.1140555 | N | POLYGON ((231406.529 893484.217, 231413.864 89... | 97.0 | 7.0 | 3.0 | 107.0 |
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 0.0 | 0.0 | 10.0 | 10.0 |
62 rows × 17 columns
# Map the number of transit stops in each ZCTA.
ax = zcta_nodes.plot(column='nodes_count',
legend=True,
edgecolor='black',
cmap='OrRd',
figsize=(12, 12),
legend_kwds={'label': "Number of Mass Transit Stops"})
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
plt.title('Number of Mass Transit Stops by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# Calculate node density in nodes/sqkm.
zcta_nodes['nodes_density'] = zcta_nodes.nodes_count/zcta_nodes.area*(10**6)
zcta_nodes.sort_values(by='nodes_density', ascending=False).head()
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | bus_stop_count | rt_stop_count | train_stop_count | nodes_count | nodes_density | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 0.0 | 0.0 | 10.0 | 10.0 | 128.417764 |
| 59 | 493 | 25 | 02111 | 2502111 | B5 | G6350 | S | 754214 | 79209 | +42.3487843 | -071.0589880 | N | POLYGON ((236144.915 900687.565, 236153.840 90... | 13.0 | 8.0 | 9.0 | 30.0 | 39.540383 |
| 49 | 378 | 25 | 02118 | 2502118 | B5 | G6350 | S | 2747439 | 0 | +42.3378630 | -071.0708157 | N | POLYGON ((235192.725 898164.265, 235174.169 89... | 87.0 | 15.0 | 1.0 | 103.0 | 37.491925 |
| 33 | 253 | 25 | 02109 | 2502109 | B5 | G6350 | S | 427819 | 0 | +42.3626531 | -071.0538044 | N | MULTIPOLYGON (((236437.338 900792.044, 236429.... | 13.0 | 1.0 | 0.0 | 14.0 | 33.228131 |
| 29 | 249 | 25 | 02119 | 2502119 | B5 | G6350 | S | 4153196 | 0 | +42.3238812 | -071.0853318 | N | POLYGON ((235115.975 898168.657, 235148.036 89... | 120.0 | 2.0 | 0.0 | 122.0 | 29.376895 |
# View statistics for nodes_density.
zcta_nodes.nodes_density.describe()
count 62.000000 mean 17.298751 std 16.739132 min 0.478677 25% 9.111375 50% 14.511331 75% 20.553204 max 128.417764 Name: nodes_density, dtype: float64
# View histogram for distribution of nodes_density.
sns.histplot(data=zcta_nodes, x='nodes_density')
plt.title('Distribution of Mass Transit Density per ZCTA in Boston Area')
plt.show()
The top value for nodes_density greatly exceeds the next value, despite having a low nodes_count, indicating it is an outlier. The ZCTA in question, ZCTA 02222, appears to contain only TD Garden and North Station. Though setting the nodes_density value for ZCTA 02222 to the median value to prevent skewing the analysis was contemplated, the decision was ultimately made to allow the outlier as reclassification would nearly negate the skew.
# Map the density of transit nodes in each ZCTA.
ax = zcta_nodes.plot(column='nodes_density',
legend=True,
edgecolor='black',
cmap='YlGnBu',
figsize=(12, 12),
legend_kwds={'label': "Mass Transit Stops per sqkm"})
tufts_bu.plot(ax=ax, color='maroon', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='maroon', alpha=0.5, label='Shortest Paths')
plt.title('Density of Mass Transit Stops by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
To reclassify indicators, quantile values need to be found for each indicator and used to reclassify values in roughly equal segments. The quantiles function was created to find any number of quantiles, while the reclass_5 function was created to reclassify indicators into five classes.
quantiles and reclass_5¶quantiles takes a DataFrame, a column name, and a list of quantile thresholds (between 0 and 1) and outputs a list of quantile values for the column.
def quantiles(df, col, threshold_list):
"""
Takes a DataFrame, a column name, and a list of quantile thresholds
(between 0 and 1) and outputs a list of quantile values for the column.
Requires NumPy to run numpy.quantile function.
Inputs:
df = DataFrame variable name
col = column name as a string
e.g. 'mean'
threshold list = bracketed list of quantile thresholds between 0 and 1
e.g. [0.25, 0.5, 0.75], [0.2, 0.4, 0.6, 0.8]
Example:
>>> quarts = [0.25, 0.5, 0.75]
>>> quantiles(student, 'grades', quarts)
[65.6, 80.5, 88.0]
"""
# Create empty list.
quant_list = []
# Calculate and append quantiles.
for i in range(len(threshold_list)):
quant_list.append(np.quantile(df[col], threshold_list[i]))
return quant_list
reclass_5 takes a value and reclassifies it into 1 of 5 classes given a list of values for class thresholds and order preference.
# Create a function that reclassifies an array into 5 classes.
def reclass_5(val, class_list, order):
"""
Takes a value and reclassifies it into 1 of 5 classes given a list
of values for class thresholds and order preference.
Assumes no overall minimum or maximum.
Requires NumPy to run to account for np.NaN values.
Inputs:
val = value to classify
class_list = numeric list of class thresholds in any order
e.g. [1.5, 3, 6, 4.5] or [100, 200, 400, 800]
order = 'low' or 'high' for which values are preferable
e.g. 'low' indicates lower values are preferable
Example:
>>> thresholds = [200, 800, 400, 600]
>>> reclass_5(693, thresholds, 'high')
4
"""
# Assert class_list is a list, has four values, and all values are numeric.
assert type(class_list)==list, "class_list must be a list."
assert len(class_list)==4, "class_list must have four values."
assert all(isinstance(x, (int, float)) for x in class_list), "class_list must be comprised of only numbers."
# Sort class_list descending.
class_sort = sorted(class_list, reverse=True)
# Return np.NaN if value is np.NaN.
if np.isnan(val):
return np.NaN
# Reclassify if lower values are preferred.
elif order=='low':
if val >= class_sort[0]:
return 1
elif val >= class_sort[1]:
return 2
elif val >= class_sort[2]:
return 3
elif val >= class_sort[3]:
return 4
else:
return 5
# Reclassify if higher values are preferred.
elif order=='high':
if val >= class_sort[0]:
return 5
elif val >= class_sort[1]:
return 4
elif val >= class_sort[2]:
return 3
elif val >= class_sort[3]:
return 2
else:
return 1
# Calculate values to separate median into five quantiles.
quintiles = [0.2, 0.4, 0.6, 0.8]
nodes_quints = quantiles(zcta_nodes, 'nodes_density', quintiles)
nodes_quints
[8.453071120523035, 12.466739130549241, 17.545737265625238, 23.632847480082592]
# Reclassify nodes_density.
zcta_nodes['nodes_reclass'] = zcta_nodes['nodes_density'].apply(lambda x: reclass_5(x, nodes_quints, 'high'))
# View top and bottom five transit-dense ZCTAs.
zcta_nodes.sort_values(by='nodes_reclass', ascending=False)
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | bus_stop_count | rt_stop_count | train_stop_count | nodes_count | nodes_density | nodes_reclass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 0.0 | 0.0 | 10.0 | 10.0 | 128.417764 | 5 |
| 49 | 378 | 25 | 02118 | 2502118 | B5 | G6350 | S | 2747439 | 0 | +42.3378630 | -071.0708157 | N | POLYGON ((235192.725 898164.265, 235174.169 89... | 87.0 | 15.0 | 1.0 | 103.0 | 37.491925 | 5 |
| 18 | 116 | 25 | 02210 | 2502210 | B5 | G6350 | S | 2023105 | 0 | +42.3476816 | -071.0417309 | N | MULTIPOLYGON (((236771.808 900243.394, 236722.... | 39.0 | 13.0 | 0.0 | 52.0 | 26.095544 | 5 |
| 17 | 115 | 25 | 02115 | 2502115 | B5 | G6350 | S | 1979415 | 162098 | +42.3375449 | -071.1061732 | N | POLYGON ((233785.591 900215.963, 233773.509 90... | 48.0 | 9.0 | 0.0 | 57.0 | 26.618441 | 5 |
| 32 | 252 | 25 | 02127 | 2502127 | B5 | G6350 | S | 5184543 | 580371 | +42.3360044 | -071.0386949 | N | MULTIPOLYGON (((236779.574 897152.363, 236737.... | 121.0 | 2.0 | 0.0 | 123.0 | 23.801115 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 50 | 379 | 25 | 02114 | 2502114 | B5 | G6350 | S | 1065352 | 343761 | +42.3631745 | -071.0686463 | N | MULTIPOLYGON (((234691.735 900849.190, 234663.... | 5.0 | 3.0 | 0.0 | 8.0 | 5.945953 | 1 |
| 22 | 176 | 25 | 02163 | 2502163 | B5 | G6350 | S | 316665 | 32165 | +42.3661684 | -071.1228503 | N | POLYGON ((231019.106 902090.531, 231035.630 90... | 1.0 | 0.0 | 0.0 | 1.0 | 2.866897 | 1 |
| 43 | 365 | 25 | 02466 | 2502466 | B5 | G6350 | S | 4650253 | 263098 | +42.3444566 | -071.2486166 | N | POLYGON ((219787.259 899807.604, 219812.724 89... | 29.0 | 1.0 | 2.0 | 32.0 | 6.513284 | 1 |
| 46 | 370 | 25 | 02467 | 2502467 | B5 | G6350 | S | 5755959 | 108004 | +42.3220888 | -071.1727600 | N | POLYGON ((226191.027 898407.653, 226180.838 89... | 9.0 | 1.0 | 0.0 | 10.0 | 1.705442 | 1 |
| 21 | 171 | 25 | 02184 | 2502184 | B5 | G6350 | S | 35903196 | 2081541 | +42.2054158 | -071.0021680 | N | MULTIPOLYGON (((237408.847 882805.326, 237292.... | 167.0 | 1.0 | 1.0 | 169.0 | 4.494594 | 1 |
62 rows × 19 columns
# Map the reclassified mass transit density in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_nodes.plot(column='nodes_reclass',
legend=True,
edgecolor='black',
cmap='plasma_r',
legend_kwds={'label': "Reclassified mass transit density"},
ax=ax)
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
plt.title('Reclassified Density of Mass Transit by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# View ZCTAs with the highest densities of mass transit stops.
zcta_nodes.loc[zcta_nodes.nodes_reclass==5]['ZCTA5CE00']
2 02143 17 02115 18 02210 29 02119 32 02127 33 02109 40 02215 41 02121 48 02110 49 02118 56 02108 59 02111 61 02222 Name: ZCTA5CE00, dtype: object
Rent is a crucial expense for any non-homeowner and a large cost-burden, especially in an urban area. Rent affordability in this study was judged by the median rent for a two- or three-bedroom home in each ZCTA.
Data was obtained from Jeff Kaufman's Apartment Price Map, which scrapes data from Padmapper. September 2020 data was used because most leases in the area begin in September. Though the dataset does not cover the entire defined extent (53 of 62 ZCTAs are represented), it was the best source of point data found for the Boston Area.
Zillow data was also considered as a source, but it was much more limited in extent. Only 37 of the 62 ZCTAs in question were available in the Zillow dataset. Zillow analysis has been included as an appendix at the end of the study for those who are curious.
The following steps were used to analyze rental data:
len, min, max, median, mean, std. # Read in CSV file of Boston Area rental data, obtained from Padmapper via Jeff Kaufman.
rent_df = pd.read_csv('./data/20200919_rental_data.csv')
# Convert to GeoDataFrame, setting CRS to EPSG:4326
rent = gpd.GeoDataFrame(rent_df, geometry=gpd.points_from_xy(rent_df.longitude, rent_df.latitude))
rent = rent.set_crs('epsg:4326')
# View original rent dataset.
rent
| price | rooms | ID | longitude | latitude | geometry | |
|---|---|---|---|---|---|---|
| 0 | 2900 | 1 | 41791420 | -71.172164 | 42.278143 | POINT (-71.17216 42.27814) |
| 1 | 1960 | 1 | 29649753 | -71.157981 | 42.296967 | POINT (-71.15798 42.29697) |
| 2 | 1960 | 1 | 5036904 | -71.157926 | 42.296937 | POINT (-71.15793 42.29694) |
| 3 | 2330 | 1 | 16243592 | -71.157935 | 42.297520 | POINT (-71.15793 42.29752) |
| 4 | 2770 | 2 | 16243593 | -71.157935 | 42.297520 | POINT (-71.15793 42.29752) |
| ... | ... | ... | ... | ... | ... | ... |
| 22345 | 1550 | 1 | 43007574 | -71.020512 | 42.418345 | POINT (-71.02051 42.41835) |
| 22346 | 1800 | 1 | 43014171 | -71.003435 | 42.410790 | POINT (-71.00344 42.41079) |
| 22347 | 1950 | 2 | 43026486 | -71.021522 | 42.401784 | POINT (-71.02152 42.40178) |
| 22348 | 2400 | 3 | 43057610 | -70.998380 | 42.407041 | POINT (-70.99838 42.40704) |
| 22349 | 2982 | 1 | 43084453 | -70.991490 | 42.395760 | POINT (-70.99149 42.39576) |
22350 rows × 6 columns
# convert_n_clip rental data to CRS and extent of study area rt_zcta.
rent = convert_n_clip(rent, rt_zcta)
rent
| price | rooms | ID | longitude | latitude | geometry | |
|---|---|---|---|---|---|---|
| 0 | 2900 | 1 | 41791420 | -71.172164 | 42.278143 | POINT (227041.857 892015.211) |
| 1 | 1960 | 1 | 29649753 | -71.157981 | 42.296967 | POINT (228203.365 894110.665) |
| 2 | 1960 | 1 | 5036904 | -71.157926 | 42.296937 | POINT (228207.914 894107.351) |
| 3 | 2330 | 1 | 16243592 | -71.157935 | 42.297520 | POINT (228206.913 894172.080) |
| 4 | 2770 | 2 | 16243593 | -71.157935 | 42.297520 | POINT (228206.913 894172.080) |
| ... | ... | ... | ... | ... | ... | ... |
| 22345 | 1550 | 1 | 43007574 | -71.020512 | 42.418345 | POINT (239463.359 907647.313) |
| 22346 | 1800 | 1 | 43014171 | -71.003435 | 42.410790 | POINT (240873.695 906816.217) |
| 22347 | 1950 | 2 | 43026486 | -71.021522 | 42.401784 | POINT (239390.553 905807.311) |
| 22348 | 2400 | 3 | 43057610 | -70.998380 | 42.407041 | POINT (241292.264 906402.161) |
| 22349 | 2982 | 1 | 43084453 | -70.991490 | 42.395760 | POINT (241866.889 905152.495) |
22303 rows × 6 columns
# Narrow down listings dataset to 2 or 3 bedrooms.
rent_br = rent.loc[rent.rooms.isin([2, 3])]
rent_br
| price | rooms | ID | longitude | latitude | geometry | |
|---|---|---|---|---|---|---|
| 4 | 2770 | 2 | 16243593 | -71.157935 | 42.297520 | POINT (228206.913 894172.080) |
| 5 | 2820 | 2 | 32940159 | -71.157792 | 42.296119 | POINT (228219.328 894016.525) |
| 6 | 2850 | 2 | 32940162 | -71.157792 | 42.296119 | POINT (228219.328 894016.525) |
| 7 | 2850 | 2 | 32940169 | -71.157792 | 42.296119 | POINT (228219.328 894016.525) |
| 8 | 3000 | 2 | 35159221 | -71.157429 | 42.296666 | POINT (228249.035 894077.370) |
| ... | ... | ... | ... | ... | ... | ... |
| 22342 | 1800 | 2 | 42980951 | -71.024810 | 42.424957 | POINT (239105.533 908379.772) |
| 22343 | 2700 | 3 | 42995819 | -71.015314 | 42.410007 | POINT (239896.429 906723.576) |
| 22344 | 2700 | 3 | 43004511 | -71.015451 | 42.410235 | POINT (239885.017 906748.815) |
| 22347 | 1950 | 2 | 43026486 | -71.021522 | 42.401784 | POINT (239390.553 905807.311) |
| 22348 | 2400 | 3 | 43057610 | -70.998380 | 42.407041 | POINT (241292.264 906402.161) |
10437 rows × 6 columns
# View statistics on price of 2-3BR homes.
rent_br['price'].describe()
count 10437.000000 mean 3109.432021 std 1480.840304 min 800.000000 25% 2400.000000 50% 2850.000000 75% 3465.000000 max 61005.000000 Name: price, dtype: float64
# View histogram for distribution of rental prices.
sns.histplot(data=rent_br, x='price')
plt.title('Distribution of Rental Prices for 2-3BR Homes in Boston Area')
plt.show()
The range of rental prices for 2-3 bedroom homes is very large but the vast majority are concentrated at the lower end. A colormap to map rental prices was accordingly chosen for the most variation at the lower end.
# Plot rental listings by price with schools.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
boston_water.plot(ax=ax, alpha=0.3)
rent_br.plot(ax=ax, column='price', cmap='gist_stern', markersize=1, label='2-3BR Rentals')
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
plt.title('2-3BR Rentals within Boston Area Rapid Transit Extent', fontsize=16)
plt.legend()
plt.show()
# Conduct spatial join to match rental listings with ZCTAs.
rent_br_zcta = gpd.sjoin(rent_br, rt_zcta, how='left', op='within')
rent_br_zcta
| price | rooms | ID | longitude | latitude | geometry | index_right | index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 2770 | 2 | 16243593 | -71.157935 | 42.297520 | POINT (228206.913 894172.080) | 16 | 54 | 25 | 02132 | 2502132 | B5 | G6350 | S | 11715729 | 249516 | +42.2809050 | -071.1639646 | N |
| 5 | 2820 | 2 | 32940159 | -71.157792 | 42.296119 | POINT (228219.328 894016.525) | 16 | 54 | 25 | 02132 | 2502132 | B5 | G6350 | S | 11715729 | 249516 | +42.2809050 | -071.1639646 | N |
| 6 | 2850 | 2 | 32940162 | -71.157792 | 42.296119 | POINT (228219.328 894016.525) | 16 | 54 | 25 | 02132 | 2502132 | B5 | G6350 | S | 11715729 | 249516 | +42.2809050 | -071.1639646 | N |
| 7 | 2850 | 2 | 32940169 | -71.157792 | 42.296119 | POINT (228219.328 894016.525) | 16 | 54 | 25 | 02132 | 2502132 | B5 | G6350 | S | 11715729 | 249516 | +42.2809050 | -071.1639646 | N |
| 8 | 3000 | 2 | 35159221 | -71.157429 | 42.296666 | POINT (228249.035 894077.370) | 16 | 54 | 25 | 02132 | 2502132 | B5 | G6350 | S | 11715729 | 249516 | +42.2809050 | -071.1639646 | N |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 22342 | 1800 | 2 | 42980951 | -71.024810 | 42.424957 | POINT (239105.533 908379.772) | 58 | 177 | 25 | 02151 | 2502151 | B5 | G6350 | S | 14467229 | 996735 | +42.4182781 | -071.0050847 | N |
| 22343 | 2700 | 3 | 42995819 | -71.015314 | 42.410007 | POINT (239896.429 906723.576) | 58 | 177 | 25 | 02151 | 2502151 | B5 | G6350 | S | 14467229 | 996735 | +42.4182781 | -071.0050847 | N |
| 22344 | 2700 | 3 | 43004511 | -71.015451 | 42.410235 | POINT (239885.017 906748.815) | 58 | 177 | 25 | 02151 | 2502151 | B5 | G6350 | S | 14467229 | 996735 | +42.4182781 | -071.0050847 | N |
| 22347 | 1950 | 2 | 43026486 | -71.021522 | 42.401784 | POINT (239390.553 905807.311) | 79 | 254 | 25 | 02150 | 2502150 | B5 | G6350 | S | 5772123 | 1921 | +42.3966895 | -071.0327234 | N |
| 22348 | 2400 | 3 | 43057610 | -70.998380 | 42.407041 | POINT (241292.264 906402.161) | 58 | 177 | 25 | 02151 | 2502151 | B5 | G6350 | S | 14467229 | 996735 | +42.4182781 | -071.0050847 | N |
10437 rows × 19 columns
# Calculate statistics on price by ZCTA.
zcta_rent_stats = rent_br_zcta.groupby('ZCTA5CE00').price.agg([len, min, max, np.median, np.mean, np.std])
zcta_rent_stats
| len | min | max | median | mean | std | |
|---|---|---|---|---|---|---|
| ZCTA5CE00 | ||||||
| 02108 | 58 | 2125 | 20535 | 3600.0 | 4951.155172 | 3328.894315 |
| 02109 | 84 | 1800 | 14500 | 3600.0 | 3829.035714 | 1671.594257 |
| 02110 | 28 | 3825 | 8400 | 3825.0 | 4413.178571 | 1081.703425 |
| 02111 | 131 | 1700 | 20000 | 3793.0 | 4423.022901 | 2219.974205 |
| 02113 | 340 | 1700 | 37775 | 2800.0 | 3010.494118 | 2109.259743 |
| 02114 | 443 | 1675 | 13865 | 3095.0 | 3350.467269 | 1212.704073 |
| 02115 | 525 | 1425 | 29000 | 3000.0 | 3364.878095 | 1659.890433 |
| 02116 | 317 | 2095 | 25000 | 3930.0 | 4536.722397 | 2432.143375 |
| 02118 | 470 | 1700 | 12591 | 3497.5 | 3655.193617 | 1135.062965 |
| 02119 | 124 | 1800 | 8716 | 2700.0 | 2789.048387 | 834.638638 |
| 02120 | 186 | 1800 | 5455 | 3000.0 | 3102.408602 | 683.584420 |
| 02121 | 15 | 1900 | 4000 | 2800.0 | 2850.000000 | 670.820393 |
| 02122 | 47 | 1800 | 28000 | 2375.0 | 2915.531915 | 3751.447581 |
| 02124 | 62 | 1400 | 4000 | 2450.0 | 2482.419355 | 500.719762 |
| 02125 | 104 | 800 | 6046 | 2575.0 | 2747.125000 | 692.623655 |
| 02126 | 12 | 1900 | 2600 | 2037.5 | 2102.083333 | 233.174324 |
| 02127 | 380 | 1800 | 15324 | 3400.0 | 3470.139474 | 1019.342855 |
| 02128 | 224 | 1650 | 5791 | 2500.0 | 2681.517857 | 778.424343 |
| 02129 | 91 | 2000 | 10251 | 3300.0 | 3532.615385 | 1242.872495 |
| 02130 | 277 | 1700 | 15477 | 2750.0 | 3290.010830 | 2082.450936 |
| 02131 | 50 | 1875 | 6139 | 2500.0 | 2790.240000 | 965.723702 |
| 02132 | 80 | 1700 | 4590 | 2650.0 | 2617.887500 | 513.956926 |
| 02134 | 653 | 1458 | 6439 | 2500.0 | 2660.522205 | 685.507412 |
| 02135 | 1052 | 850 | 6520 | 2328.5 | 2508.267110 | 642.369844 |
| 02136 | 8 | 1500 | 2900 | 1935.0 | 2152.500000 | 529.973045 |
| 02138 | 363 | 1800 | 7960 | 3000.0 | 3155.355372 | 787.717616 |
| 02139 | 364 | 1800 | 9576 | 2900.0 | 3088.434066 | 878.187583 |
| 02140 | 195 | 1825 | 4800 | 2850.0 | 2963.651282 | 675.164723 |
| 02141 | 203 | 1550 | 17312 | 2850.0 | 3140.295567 | 1527.734844 |
| 02142 | 41 | 2920 | 9950 | 3415.0 | 4719.146341 | 2291.116252 |
| 02143 | 356 | 1000 | 6413 | 2600.0 | 2686.292135 | 606.074856 |
| 02144 | 355 | 1800 | 12168 | 2595.0 | 2668.177465 | 704.155943 |
| 02145 | 262 | 1700 | 12853 | 2525.0 | 2715.202290 | 1029.238874 |
| 02148 | 95 | 1650 | 3999 | 2200.0 | 2354.926316 | 516.492101 |
| 02149 | 47 | 1850 | 6600 | 2400.0 | 2805.404255 | 1147.863491 |
| 02150 | 39 | 1700 | 6854 | 2400.0 | 2588.820513 | 880.597013 |
| 02151 | 29 | 1700 | 10085 | 2502.0 | 2883.413793 | 1623.189926 |
| 02155 | 255 | 950 | 6935 | 2300.0 | 2439.800000 | 619.721879 |
| 02169 | 8 | 1650 | 2950 | 2494.5 | 2436.125000 | 486.600581 |
| 02170 | 10 | 1595 | 2995 | 1900.0 | 2046.500000 | 413.729984 |
| 02171 | 10 | 2100 | 6995 | 2500.0 | 3134.200000 | 1519.386484 |
| 02186 | 7 | 2195 | 4500 | 2750.0 | 2956.428571 | 811.350285 |
| 02199 | 4 | 4989 | 16440 | 7937.5 | 9326.000000 | 5221.182561 |
| 02210 | 98 | 3265 | 61005 | 4791.5 | 5767.857143 | 6137.020209 |
| 02215 | 677 | 1750 | 14967 | 3116.0 | 3382.194978 | 1329.241393 |
| 02222 | 2 | 2400 | 5980 | 4190.0 | 4190.000000 | 2531.442277 |
| 02445 | 263 | 1850 | 12761 | 2600.0 | 2788.733840 | 837.589096 |
| 02446 | 801 | 1300 | 13776 | 3000.0 | 3206.340824 | 985.818593 |
| 02458 | 12 | 2000 | 4200 | 2300.0 | 2449.166667 | 578.323727 |
| 02467 | 30 | 1750 | 5895 | 3125.0 | 3394.000000 | 1049.389872 |
| 02472 | 55 | 1750 | 3625 | 2400.0 | 2485.272727 | 358.625744 |
| 02474 | 47 | 1775 | 3600 | 2400.0 | 2473.914894 | 459.374472 |
| 02478 | 48 | 1900 | 4030 | 2200.0 | 2392.708333 | 418.349033 |
# View statistics for median rent prices.
zcta_rent_stats['median'].describe()
count 53.000000 mean 2927.981132 std 912.654323 min 1900.000000 25% 2450.000000 50% 2700.000000 75% 3116.000000 max 7937.500000 Name: median, dtype: float64
# View histogram for distribution of nodes_density.
sns.histplot(data=zcta_rent_stats, x='median')
plt.title('Distribution of Median Rent Price for 2-3BR Home per ZCTA in Boston Area')
plt.show()
The highest median rental price is an outlier derived from 4 listings in ZCTA 02199, which is the Prudential Center. Similar to ZCTA 02222 in Mass Transit Density, the decision was made to allow the outlier as reclassification would nearly negate the skew.
# Merge statistics with spatial ZCTA rental GDF.
zcta_rent = rt_zcta.merge(zcta_rent_stats, on='ZCTA5CE00', how='left')
zcta_rent.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 62 entries, 0 to 61 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 62 non-null int64 1 STATEFP00 62 non-null object 2 ZCTA5CE00 62 non-null object 3 GEOID00 62 non-null object 4 CLASSFP00 62 non-null object 5 MTFCC00 62 non-null object 6 FUNCSTAT00 62 non-null object 7 ALAND00 62 non-null int64 8 AWATER00 62 non-null int64 9 INTPTLAT00 62 non-null object 10 INTPTLON00 62 non-null object 11 PARTFLG00 62 non-null object 12 geometry 62 non-null geometry 13 len 53 non-null float64 14 min 53 non-null float64 15 max 53 non-null float64 16 median 53 non-null float64 17 mean 53 non-null float64 18 std 53 non-null float64 dtypes: float64(6), geometry(1), int64(3), object(9) memory usage: 9.7+ KB
# Rename columns to include rent (important for merging data later).
zcta_rent = zcta_rent.rename(columns={'len':'rent_len', 'min':'rent_min', 'max':'rent_max', 'median':'rent_median', 'mean':'rent_mean', 'std':'rent_std'})
# Map the median rent price in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_rent.plot(column='rent_median',
legend=True,
edgecolor='black',
cmap='YlGnBu',
legend_kwds={'label': "Median rental price"},
ax=ax)
tufts_bu.plot(ax=ax, color='maroon', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='maroon', alpha=0.5, label='Shortest Paths')
plt.title('Median Rental Price for 2-3BR Homes by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# Calculate values to separate median into five quantiles.
# Use same quintiles list of thresholds from reclassifying transit.
rent_quants = quantiles(zcta_rent_stats, 'median', quintiles)
rent_quants
[2400.0, 2565.0, 2850.0, 3360.0]
# Reclassify median rent prices with reclass_5 function and quintile values.
zcta_rent['rent_median_reclass'] = zcta_rent['rent_median'].apply(lambda x: reclass_5(x, rent_quants, 'low'))
# View top and bottom five median rental ZCTAs.
zcta_rent.sort_values(by='rent_median_reclass', ascending=False)
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | rent_len | rent_min | rent_max | rent_median | rent_mean | rent_std | rent_median_reclass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 31 | 25 | 02458 | 2502458 | B5 | G6350 | S | 4709183 | 57045 | +42.3541154 | -071.1891300 | N | MULTIPOLYGON (((227500.026 901300.219, 227482.... | 12.0 | 2000.0 | 4200.0 | 2300.0 | 2449.166667 | 578.323727 | 5.0 |
| 11 | 58 | 25 | 02126 | 2502126 | B5 | G6350 | S | 5260166 | 67030 | +42.2748559 | -071.0929083 | N | MULTIPOLYGON (((232299.828 890594.954, 232285.... | 12.0 | 1900.0 | 2600.0 | 2037.5 | 2102.083333 | 233.174324 | 5.0 |
| 39 | 305 | 25 | 02135 | 2502135 | B5 | G6350 | S | 7222258 | 469151 | +42.3484167 | -071.1543392 | N | POLYGON ((227999.354 898147.705, 227992.923 89... | 1052.0 | 850.0 | 6520.0 | 2328.5 | 2508.267110 | 642.369844 | 5.0 |
| 31 | 251 | 25 | 02122 | 2502122 | B5 | G6350 | S | 4420941 | 9382 | +42.2950881 | -071.0525334 | N | POLYGON ((237311.389 892366.860, 237300.520 89... | 47.0 | 1800.0 | 28000.0 | 2375.0 | 2915.531915 | 3751.447581 | 5.0 |
| 27 | 237 | 25 | 02478 | 2502478 | B5 | G6350 | S | 11967374 | 186828 | +42.3953169 | -071.1802840 | N | POLYGON ((227491.087 906255.384, 227559.274 90... | 48.0 | 1900.0 | 4030.0 | 2200.0 | 2392.708333 | 418.349033 | 5.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 25 | 229 | 25 | 02460 | 2502460 | B5 | G6350 | S | 3651425 | 20856 | +42.3520061 | -071.2092424 | N | POLYGON ((224712.729 900305.384, 224780.933 90... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 26 | 236 | 25 | 02461 | 2502461 | B5 | G6350 | S | 3263402 | 13513 | +42.3190310 | -071.2081690 | N | POLYGON ((223647.892 895122.104, 223644.504 89... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 35 | 285 | 25 | 02459 | 2502459 | B5 | G6350 | S | 12417620 | 237715 | +42.3142917 | -071.1946111 | N | MULTIPOLYGON (((226381.301 895402.075, 226417.... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 43 | 365 | 25 | 02466 | 2502466 | B5 | G6350 | S | 4650253 | 263098 | +42.3444566 | -071.2486166 | N | POLYGON ((219787.259 899807.604, 219812.724 89... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 44 | 366 | 25 | 02465 | 2502465 | B5 | G6350 | S | 5288332 | 698 | +42.3495328 | -071.2273393 | N | POLYGON ((223352.806 900852.454, 223367.983 90... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
62 rows × 20 columns
# Map the reclassified median rent price in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_rent.plot(column='rent_median_reclass',
legend=True,
edgecolor='black',
cmap='plasma_r', #reverse colormap to indicate lower values are preferable
legend_kwds={'label': "Reclassified median rental price"},
ax=ax)
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
plt.title('Reclassified Median Rental Price for 2-3BR Homes by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# View ZCTAs with the lowest median rental price for a 2- or 3-bedroom home.
zcta_rent.loc[zcta_rent.rent_median_reclass==5]['ZCTA5CE00']
0 02458 11 02126 12 02148 15 02170 19 02155 27 02478 28 02136 31 02122 39 02135 Name: ZCTA5CE00, dtype: object
According to this analysis, rents are more afforable farther from the city center. This median rent price map is nearly the inverse of the mass transit density map.
Though rental data was only available in 53 of the 62 ZCTAs, the remaining indicators will be analyzed for the entire extent. ZCTAs lacking rental data will be excluded in the overall index calculation at the end to ensure the analysis's fairness across ZCTAs.
Accessibility of necessities and amenities is crucial to living anywhere. For the purposes of this study, necessities were defined as follows:
Some data was imported from sources such as MassGIS while others were retrieved using OpenStreetMap and OSMnx's geometries_from_polygon function.
As these three indicators each has more than one dataset to base analysis on (unlike the preceding rental data), the function calc_density was created to automate density calculations based on counts of records in multiple datasets (e.g. density of food establishments per square kilometer in each ZCTA based on counts of groceries, prepared food, and farmers markets). calc_density can be found in the food analysis where it was first used.
OSMnx¶To use OSMnx geometries_from_polygon, a new polygon of the extent's convex hull was created in latitude-longitude coordinates by converting the T stops shapefile to EPSG:4326 for lat-long and creating a convex hull with unary_union.convex_hull.
# Extract bounds of T Stops.
graph_extent_latlong = bos_rt_node.to_crs('epsg:4326').unary_union.convex_hull.buffer(0.05)
graph_extent_latlong
Groceries and prepared food establishments within this study were found using OSMnx and the appropriate OSM tags.
Farmers markets were obtained from MassGIS.
# Retrieve groceries features within graph_extent_latlong from OSMnx and view info.
grocery_tags = {'shop':['supermarket', 'grocery', 'greengrocer', 'bakery', 'butcher', 'deli', 'dairy', 'farm', 'seafood']}
grocery = ox.geometries_from_polygon(graph_extent_latlong, grocery_tags)
grocery = convert_n_clip(grocery, rt_zcta)
grocery.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 221 entries, 0 to 251 Columns: 105 entries, unique_id to type dtypes: geometry(1), int64(1), object(103) memory usage: 183.0+ KB
# View first five rows of groceries GDF.
grocery.head()
| unique_id | osmid | element_type | brand | brand:wikidata | brand:wikipedia | name | note | shop | geometry | ... | alt_name:ko | name:zh-Hans | name:zh-Hant | building:colour | building:min_level | roof:colour | toilets:wheelchair | landuse | ways | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | node/314092543 | 314092543 | node | Stop & Shop | Q3658429 | en:Stop & Shop | Stop & Shop | store new since 2006 aerial imagery | supermarket | POINT (235831.905 897407.979) | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | node/355795164 | 355795164 | node | NaN | NaN | NaN | La Ronga Bakery & Delicatessen | NaN | bakery | POINT (232151.416 903802.162) | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | node/431377975 | 431377975 | node | NaN | NaN | NaN | Quebrada Baking Co. | NaN | bakery | POINT (229412.784 906211.019) | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | node/460001186 | 460001186 | node | NaN | NaN | NaN | Star Market | NaN | supermarket | POINT (231430.529 904430.448) | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | node/460211013 | 460211013 | node | NaN | NaN | NaN | Star Market | NaN | supermarket | POINT (228901.367 902823.634) | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 105 columns
# Use count_records on groceries GDF.
zcta_grocery_count = count_records(grocery, rt_zcta, 'ZCTA5CE00', 'grocery_count')
zcta_grocery_count
| ZCTA5CE00 | grocery_count | |
|---|---|---|
| 0 | 02138 | 14 |
| 1 | 02143 | 11 |
| 2 | 02130 | 9 |
| 3 | 02169 | 9 |
| 4 | 02472 | 8 |
| 5 | 02134 | 8 |
| 6 | 02446 | 8 |
| 7 | 02139 | 8 |
| 8 | 02155 | 7 |
| 9 | 02148 | 7 |
| 10 | 02128 | 7 |
| 11 | 02144 | 7 |
| 12 | 02215 | 6 |
| 13 | 02149 | 6 |
| 14 | 02115 | 6 |
| 15 | 02140 | 6 |
| 16 | 02145 | 5 |
| 17 | 02478 | 5 |
| 18 | 02114 | 4 |
| 19 | 02474 | 4 |
| 20 | 02116 | 4 |
| 21 | 02127 | 4 |
| 22 | 02113 | 4 |
| 23 | 02136 | 3 |
| 24 | 02118 | 3 |
| 25 | 02184 | 3 |
| 26 | 02135 | 3 |
| 27 | 02170 | 3 |
| 28 | 02108 | 3 |
| 29 | 02141 | 3 |
| 30 | 02111 | 3 |
| 31 | 02464 | 2 |
| 32 | 02132 | 2 |
| 33 | 02186 | 2 |
| 34 | 02151 | 2 |
| 35 | 02222 | 2 |
| 36 | 02467 | 2 |
| 37 | 02445 | 2 |
| 38 | 02460 | 2 |
| 39 | 02210 | 2 |
| 40 | 02465 | 2 |
| 41 | 02150 | 2 |
| 42 | 02125 | 2 |
| 43 | 02109 | 1 |
| 44 | 02131 | 1 |
| 45 | 02126 | 1 |
| 46 | 02142 | 1 |
| 47 | 02124 | 1 |
| 48 | 02122 | 1 |
| 49 | 02110 | 1 |
| 50 | 02466 | 1 |
| 51 | 02459 | 1 |
| 52 | 02121 | 1 |
| 53 | 02199 | 1 |
| 54 | 02120 | 1 |
| 55 | 02129 | 1 |
| 56 | 02119 | 1 |
# Retrieve prepared food features within graph_extent_latlong from OSMnx and view info.
prep_food_tags = {'amenity':['cafe', 'restaurant', 'fast_food']}
prep_food = ox.geometries_from_polygon(graph_extent_latlong, prep_food_tags)
prep_food = convert_n_clip(prep_food, rt_zcta)
prep_food.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 2039 entries, 0 to 2283 Columns: 177 entries, unique_id to type dtypes: geometry(1), int64(1), object(175) memory usage: 2.8+ MB
# View first five rows of prepared food GDF.
prep_food.head()
| unique_id | osmid | element_type | amenity | cuisine | name | name:zh | geometry | addr:housenumber | opening_hours | ... | height | heritage | biergarten | landuse | building:colour | restaurant | roof:material | former:building | ways | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | node/325419354 | 325419354 | node | restaurant | chinese | Dar Hee | 達熙 | POINT (237094.772 907512.932) | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | node/326391966 | 326391966 | node | restaurant | NaN | Tawakal Halal | NaN | POINT (238793.752 901798.067) | 389 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | node/329705548 | 329705548 | node | fast_food | pizza | Big Daddy's | NaN | POINT (229587.646 901377.423) | NaN | Mo-Su 11:00-22:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | node/345423108 | 345423108 | node | restaurant | pizza | Romanza Pizzaria | NaN | POINT (233987.726 904915.453) | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | node/345423146 | 345423146 | node | fast_food | NaN | Dunkin' Donuts | NaN | POINT (232394.959 903738.778) | 519 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 177 columns
# Use count_records on prepared food GDF.
zcta_prep_food_count = count_records(prep_food, rt_zcta, 'ZCTA5CE00', 'prep_food_count')
zcta_prep_food_count
| ZCTA5CE00 | prep_food_count | |
|---|---|---|
| 0 | 02116 | 136 |
| 1 | 02139 | 112 |
| 2 | 02138 | 112 |
| 3 | 02128 | 108 |
| 4 | 02215 | 95 |
| 5 | 02115 | 81 |
| 6 | 02148 | 75 |
| 7 | 02143 | 64 |
| 8 | 02446 | 63 |
| 9 | 02111 | 61 |
| 10 | 02134 | 59 |
| 11 | 02144 | 58 |
| 12 | 02210 | 56 |
| 13 | 02108 | 55 |
| 14 | 02114 | 52 |
| 15 | 02130 | 44 |
| 16 | 02135 | 43 |
| 17 | 02113 | 43 |
| 18 | 02110 | 41 |
| 19 | 02155 | 40 |
| 20 | 02184 | 39 |
| 21 | 02474 | 38 |
| 22 | 02109 | 35 |
| 23 | 02142 | 35 |
| 24 | 02169 | 33 |
| 25 | 02127 | 32 |
| 26 | 02149 | 30 |
| 27 | 02472 | 30 |
| 28 | 02140 | 29 |
| 29 | 02145 | 29 |
| 30 | 02150 | 28 |
| 31 | 02118 | 25 |
| 32 | 02478 | 22 |
| 33 | 02199 | 21 |
| 34 | 02141 | 20 |
| 35 | 02445 | 20 |
| 36 | 02151 | 16 |
| 37 | 02129 | 16 |
| 38 | 02120 | 13 |
| 39 | 02459 | 12 |
| 40 | 02125 | 10 |
| 41 | 02467 | 10 |
| 42 | 02171 | 10 |
| 43 | 02119 | 9 |
| 44 | 02122 | 8 |
| 45 | 02124 | 8 |
| 46 | 02465 | 7 |
| 47 | 02136 | 7 |
| 48 | 02222 | 7 |
| 49 | 02464 | 6 |
| 50 | 02466 | 6 |
| 51 | 02461 | 6 |
| 52 | 02170 | 5 |
| 53 | 02132 | 5 |
| 54 | 02121 | 4 |
| 55 | 02458 | 3 |
| 56 | 02126 | 2 |
| 57 | 02460 | 2 |
| 58 | 02131 | 1 |
| 59 | 02468 | 1 |
# read_n_clip farmers markets shapefile from MassGIS and view info.
farmer_mrkt = read_n_clip('./data/farmersmarkets/FARMERSMARKETS_PT.shp', rt_zcta)
farmer_mrkt.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 58 entries, 2 to 296 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MARKET_ID 58 non-null float64 1 NAME 58 non-null object 2 TYPE 58 non-null object 3 ADDR_1 58 non-null object 4 ADDR_2 55 non-null object 5 TOWN 58 non-null object 6 ZIP_CODE 58 non-null object 7 DAY_TIME 58 non-null object 8 DATES 58 non-null object 9 UPDATE_DAT 58 non-null object 10 YEAR_START 57 non-null object 11 WEBSITE 43 non-null object 12 EBT 48 non-null object 13 WIC_CVV 0 non-null object 14 COUPONS 50 non-null object 15 LONGITUDE 58 non-null float64 16 LATITUDE 58 non-null float64 17 geometry 58 non-null geometry dtypes: float64(3), geometry(1), object(14) memory usage: 8.6+ KB
# View first five rows of farmer_mrkt.
farmer_mrkt.head()
| MARKET_ID | NAME | TYPE | ADDR_1 | ADDR_2 | TOWN | ZIP_CODE | DAY_TIME | DATES | UPDATE_DAT | YEAR_START | WEBSITE | EBT | WIC_CVV | COUPONS | LONGITUDE | LATITUDE | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 535.0 | West Newton | Farmers Markets | Elm Street | off Washington Street | West Newton | 02465 | Saturday, 10:00 am - 2:00 pm | June 18 to October 8 | 2016 | 2014 | http://www.newtonma.gov/gov/parks | EBT-SNAP Accepted | None | WIC & Senior Coupons Accepted | -71.22932 | 42.34906 | POINT (222302.311 899875.783) |
| 15 | 153.0 | Everett | Farmers Markets | Everett Square | Municipal Parking Lot next to Braza Grill | Everett | 02149 | Wednesday, 1:30 pm - 6:30 pm | June to October | 2016 | 2009 | http://www.everettfarmersmarket.worldpress.com | EBT-SNAP Accepted | None | WIC & Senior Coupons Accepted | -71.05493 | 42.40779 | POINT (236636.795 906459.547) |
| 16 | 154.0 | East Boston | Farmers Markets | 209 Sumner Street | Lewis Mall across from Maverick MBTA Station | Boston | 02128 | Wednesday, 3:00 pm - 6:30 pm | July 6 to October 19 | 2016 | 2008 | http://www.ebnhc.org/farmersmarket.php | EBT-SNAP Accepted | None | WIC & Senior Coupons Accepted | -71.03940 | 42.36899 | POINT (237938.437 902156.590) |
| 17 | 160.0 | Braintree | Farmers Markets | 1 John F Kennedy Memorial Dr. | Town Hall Mall | Braintree | 02184 | Saturday, 9:00 am - 1:00 pm | June 18 to October 22, (Holiday Market 11/19) | 2016 | 2009 | http://www.braintreefarmersmarket.com | EBT-SNAP Accepted | None | WIC & Senior Coupons Accepted | -71.00496 | 42.20580 | POINT (240880.342 884046.442) |
| 21 | 167.0 | Dorchester/Dorchester House | Farmers Markets | 1353 Dorchester Ave | Dorchester House | Dorchester | 02122 | Tuesday, 11:30 am - 2:00 pm | July 5 to October 11 | 2016 | 2007 | http://www.dorchesterhouse.org/services/farmer... | EBT-SNAP Accepted | None | WIC & Senior Coupons Accepted | -71.05912 | 42.30453 | POINT (236351.175 894988.130) |
# Use count_records on farmer_mrkt.
zcta_farmer_mrkt_count = count_records(farmer_mrkt, rt_zcta, 'ZCTA5CE00', 'farmer_mrkt_count')
zcta_farmer_mrkt_count
| ZCTA5CE00 | farmer_mrkt_count | |
|---|---|---|
| 0 | 02124 | 5 |
| 1 | 02130 | 4 |
| 2 | 02108 | 3 |
| 3 | 02139 | 3 |
| 4 | 02118 | 3 |
| 5 | 02446 | 2 |
| 6 | 02143 | 2 |
| 7 | 02142 | 2 |
| 8 | 02138 | 2 |
| 9 | 02131 | 2 |
| 10 | 02122 | 2 |
| 11 | 02134 | 2 |
| 12 | 02144 | 1 |
| 13 | 02128 | 1 |
| 14 | 02132 | 1 |
| 15 | 02126 | 1 |
| 16 | 02127 | 1 |
| 17 | 02116 | 1 |
| 18 | 02125 | 1 |
| 19 | 02119 | 1 |
| 20 | 02474 | 1 |
| 21 | 02151 | 1 |
| 22 | 02135 | 1 |
| 23 | 02215 | 1 |
| 24 | 02145 | 1 |
| 25 | 02149 | 1 |
| 26 | 02459 | 1 |
| 27 | 02115 | 1 |
| 28 | 02186 | 1 |
| 29 | 02210 | 1 |
| 30 | 02472 | 1 |
| 31 | 02184 | 1 |
| 32 | 02478 | 1 |
| 33 | 02155 | 1 |
| 34 | 02465 | 1 |
| 35 | 02120 | 1 |
| 36 | 02129 | 1 |
| 37 | 02170 | 1 |
calc_density¶This function was created to automate density calculations based on counts of records in multiple datasets (e.g. density of food establishments per square kilometer in each ZCTA based on groceries, prepared food, and farmers markets).
The purpose of this function is to calculate the density of a set of records in a GeoDataFrame of polygons. It takes a base GeoDataFrame and, using function multimerge, merges it with a list of GeoDataFrames on the specified column or columns in the list of on columns and with the specified how. The GDFs within the list are expected to have columns with counts, and those columns are given in a list. The function then adds together the counts in specified columns and calculates the density.
def calc_density(orig_gdf, gdf_count_list, on_col, how, count_cols, total_count_col, density_col):
"""
The purpose of this function is to calculate the density of a set of records
in a GeoDataFrame of polygons.
Takes a base GeoDataFrame and, using function multimerge, merges it with a list of GDFs
on the specified column or list of columns and with the specified 'how'.
The GDFs within the list are expected to have columns with counts.
Adds together the counts in specified columns, then calculates the density.
Assumes on_col exists across all DFs.
Assumes units of the GDF are in meters, with density output of per sqkm.
Requires Pandas to run merge method (within multimerge) and fillna().
Inputs:
orig_gdf = base GDF
gdf_count_list = list of GDFs with counts to calculate from
e.g. [gdf1, gdf2, gdf3]
on_col = bracketed column name or list of columns (same across GDFs)
e.g. ['name'], ['name', 'address']
how = how argument
e.g. 'left'
count_cols = bracketed list of column names with counts, order does not matter
e.g. ['gdf1_count', 'gdf2_count', 'gdf3_count']
total_count_col = string name for new column with total counts
e.g. 'total_count'
density_col = string name for new column with density (per sqkm)
e.g. 'bike_density'
Example:
>>> schools = [elem, middle, high]
>>> on_col = ['town_name', 'state']
>>> count_cols = ['elem_classrooms', 'middle_classrooms', 'high_classrooms']
>>> town_schools = multimerge(town,
schools,
on_col,
'left',
count_cols,
'total_classrooms',
'classroom_density')
"""
# Run multimerge on the GDFs.
density_gdf = multimerge(orig_gdf, gdf_count_list, on_col, how).fillna(0)
# Create new total_count_col with dtype float (fill in with 0.0).
density_gdf[total_count_col] = 0.0
# Add count columns.
for i in range(len(count_cols)):
density_gdf[total_count_col] = density_gdf[total_count_col] + density_gdf[count_cols[i]]
# Calculate density.
density_gdf[density_col] = density_gdf[total_count_col]/density_gdf.area*(10**6)
return density_gdf
# Use calc_density to calculate density of food establishments per ZCTA.
food_gdfs = [zcta_grocery_count, zcta_prep_food_count, zcta_farmer_mrkt_count]
food_cols = ['grocery_count', 'prep_food_count', 'farmer_mrkt_count']
zcta_food = calc_density(rt_zcta, food_gdfs, ['ZCTA5CE00'], 'left', food_cols, 'food_count', 'food_density')
# View five most food-dense ZCTAs.
zcta_food.sort_values(by='food_density', ascending=False).head()
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | grocery_count | prep_food_count | farmer_mrkt_count | food_count | food_density | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 55 | 437 | 25 | 02113 | 2502113 | B5 | G6350 | S | 272995 | 0 | +42.3651696 | -071.0553627 | N | POLYGON ((236403.940 901958.860, 236443.429 90... | 4.0 | 43.0 | 0.0 | 47.0 | 172.176956 |
| 42 | 308 | 25 | 02199 | 2502199 | B5 | G6350 | S | 149678 | 0 | +42.3474648 | -071.0820577 | N | POLYGON ((234168.090 899801.270, 234292.861 89... | 1.0 | 21.0 | 0.0 | 22.0 | 146.992162 |
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 2.0 | 7.0 | 0.0 | 9.0 | 115.575988 |
| 48 | 377 | 25 | 02110 | 2502110 | B5 | G6350 | S | 428520 | 0 | +42.3573712 | -071.0531804 | N | POLYGON ((236791.320 900796.888, 236751.472 90... | 1.0 | 41.0 | 0.0 | 42.0 | 99.910762 |
| 33 | 253 | 25 | 02109 | 2502109 | B5 | G6350 | S | 427819 | 0 | +42.3626531 | -071.0538044 | N | MULTIPOLYGON (((236437.338 900792.044, 236429.... | 1.0 | 35.0 | 0.0 | 36.0 | 85.443766 |
# View stats for food_density
zcta_food.food_density.describe()
count 62.000000 mean 21.372276 std 37.098875 min 0.000000 25% 1.550315 50% 5.130476 75% 20.625985 max 172.176956 Name: food_density, dtype: float64
# View histogram for distribution of food_density.
sns.histplot(data=zcta_food, x='food_density')
plt.title('Distribution of Food Establishment Density per ZCTA in Boston Area')
plt.show()
# Map the density of food establishments in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_food.plot(column='food_density',
legend=True,
edgecolor='black',
cmap='YlGnBu',
legend_kwds={'label': "Food establishments per sqkm"},
ax=ax)
tufts_bu.plot(ax=ax, color='maroon', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='maroon', alpha=0.5, label='Shortest Paths')
plt.title('Density of Food Establishments by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
The Downtown Boston area evidently has the most food establishments per square kilometer while outer ZCTAs have fewer. The small food-dense ZCTA near BU is ZCTA 02199, the Prudential Center.
# Calculate values to separate food density into five quantiles.
food_quints = quantiles(zcta_food, 'food_density', quintiles)
food_quints
[1.4331169108386328, 2.3792512364611573, 8.133602216983999, 28.381466858715]
# Reclassify food density with reclass_5 function and quintile values.
zcta_food['food_reclass'] = zcta_food['food_density'].apply(lambda x: reclass_5(x, food_quints, 'high'))
# View top and bottom five food dense ZCTAs.
zcta_food.sort_values(by='food_reclass', ascending=False)
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | grocery_count | prep_food_count | farmer_mrkt_count | food_count | food_density | food_reclass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 2.0 | 7.0 | 0.0 | 9.0 | 115.575988 | 5 |
| 17 | 115 | 25 | 02115 | 2502115 | B5 | G6350 | S | 1979415 | 162098 | +42.3375449 | -071.1061732 | N | POLYGON ((233785.591 900215.963, 233773.509 90... | 6.0 | 81.0 | 1.0 | 88.0 | 41.095138 | 5 |
| 18 | 116 | 25 | 02210 | 2502210 | B5 | G6350 | S | 2023105 | 0 | +42.3476816 | -071.0417309 | N | MULTIPOLYGON (((236771.808 900243.394, 236722.... | 2.0 | 56.0 | 1.0 | 59.0 | 29.608405 | 5 |
| 48 | 377 | 25 | 02110 | 2502110 | B5 | G6350 | S | 428520 | 0 | +42.3573712 | -071.0531804 | N | POLYGON ((236791.320 900796.888, 236751.472 90... | 1.0 | 41.0 | 0.0 | 42.0 | 99.910762 | 5 |
| 53 | 435 | 25 | 02116 | 2502116 | B5 | G6350 | S | 1712064 | 222385 | +42.3499907 | -071.0760802 | N | POLYGON ((234702.966 899809.334, 234675.900 89... | 4.0 | 136.0 | 1.0 | 141.0 | 72.893469 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 24 | 226 | 25 | 02468 | 2502468 | B5 | G6350 | S | 4120625 | 57832 | +42.3291748 | -071.2315831 | N | POLYGON ((220628.408 897708.340, 220639.271 89... | 0.0 | 1.0 | 0.0 | 1.0 | 0.239338 | 1 |
| 25 | 229 | 25 | 02460 | 2502460 | B5 | G6350 | S | 3651425 | 20856 | +42.3520061 | -071.2092424 | N | POLYGON ((224712.729 900305.384, 224780.933 90... | 2.0 | 2.0 | 0.0 | 4.0 | 1.089313 | 1 |
| 35 | 285 | 25 | 02459 | 2502459 | B5 | G6350 | S | 12417620 | 237715 | +42.3142917 | -071.1946111 | N | MULTIPOLYGON (((226381.301 895402.075, 226417.... | 1.0 | 12.0 | 1.0 | 14.0 | 1.106327 | 1 |
| 28 | 248 | 25 | 02136 | 2502136 | B5 | G6350 | S | 11889865 | 266448 | +42.2550833 | -071.1292203 | N | POLYGON ((232000.362 889791.619, 232054.679 88... | 3.0 | 7.0 | 0.0 | 10.0 | 0.825368 | 1 |
| 0 | 31 | 25 | 02458 | 2502458 | B5 | G6350 | S | 4709183 | 57045 | +42.3541154 | -071.1891300 | N | MULTIPOLYGON (((227500.026 901300.219, 227482.... | 0.0 | 3.0 | 0.0 | 3.0 | 0.629469 | 1 |
62 rows × 19 columns
# Map the reclassified food density in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_food.plot(column='food_reclass',
legend=True,
edgecolor='black',
cmap='plasma_r',
legend_kwds={'label': "Reclassified food density"},
ax=ax)
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
plt.title('Reclassified Density of Food Establishments by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# View ZCTAs with the highest food density (over 28 establishments/sqkm).
zcta_food.loc[zcta_food.food_reclass==5]['ZCTA5CE00']
17 02115 18 02210 33 02109 40 02215 42 02199 48 02110 50 02114 53 02116 55 02113 56 02108 57 02139 59 02111 61 02222 Name: ZCTA5CE00, dtype: object
After reclassification, the distribution of food-dense ZCTAs has evened out. The central area of the map still has the highest concentration of food density, but class 4 (having between 8 and 28 food establishments per sqkm) covers a large area as well.
Health services were defined as Community Health Centers and Hospitals from MassGIS and found with the appropriate OSM tags for healthcare and amenities, such as clinics and doctors offices. Social services facilities and veterinary servies were omitted.
# read_n_clip Community Health Centers shapefile from MassGIS and view info.
comm_health = read_n_clip('./data/chcs/CHCS_PT.shp', rt_zcta)
comm_health.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 84 entries, 1 to 194 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 SITE_NAME 84 non-null object 1 SITE_TYPE 84 non-null object 2 ADMIN_ONLY 84 non-null object 3 ADDRESS 84 non-null object 4 ADDRESS_OT 16 non-null object 5 MAIL_CITY 84 non-null object 6 ZIP 84 non-null object 7 PO_BOX 0 non-null object 8 TOWN 84 non-null object 9 TOWN_ID 84 non-null int64 10 SATELLITE 84 non-null object 11 EYE 84 non-null object 12 DENTAL 84 non-null object 13 METHOD 84 non-null object 14 GIS_ID 71 non-null object 15 MCHC_CODE 84 non-null object 16 MAD_ID 84 non-null int64 17 geometry 84 non-null geometry dtypes: geometry(1), int64(2), object(15) memory usage: 12.5+ KB
# View first five rows of comm_health.
comm_health.head()
| SITE_NAME | SITE_TYPE | ADMIN_ONLY | ADDRESS | ADDRESS_OT | MAIL_CITY | ZIP | PO_BOX | TOWN | TOWN_ID | SATELLITE | EYE | DENTAL | METHOD | GIS_ID | MCHC_CODE | MAD_ID | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Chinatown Clinic | NOS | N | 885 Washington Street | None | Boston | 02111 | None | BOSTON | 35 | Y | N | N | G | S99 | SCCHC | 35145770 | POINT (235848.295 899811.714) |
| 3 | St. Francis House Clinic | HOMELESS | N | 39 Boylston Street | None | Boston | 02116 | None | BOSTON | 35 | Y | N | N | G | S101 | BHCHP | 35018316 | POINT (235955.316 900313.222) |
| 8 | Teen Health Center at Cambridge Rindge and Lat... | SCHOOL | N | 459 Broadway | None | Cambridge | 02138 | None | CAMBRIDGE | 49 | Y | N | N | G | S106 | CHA | 3624789 | POINT (231966.092 902732.932) |
| 9 | The Great Hall | NOS | N | 6 Norfolk Street | None | Dorchester | 02124 | None | BOSTON | 35 | Y | N | N | G | S107 | CSHC | 35103391 | POINT (235304.694 893366.484) |
| 11 | Upham's Corner Adolescent Health & WIC | NOS | N | 500 Columbia Road | None | Dorchester | 02125 | None | BOSTON | 35 | Y | N | N | G | S111 | UCHC | 35035283 | POINT (235647.822 896122.920) |
# Use count_records on comm_health.
zcta_comm_health_count = count_records(comm_health, rt_zcta, 'ZCTA5CE00', 'comm_health_count')
zcta_comm_health_count
| ZCTA5CE00 | comm_health_count | |
|---|---|---|
| 0 | 02139 | 7 |
| 1 | 02125 | 7 |
| 2 | 02149 | 5 |
| 3 | 02111 | 5 |
| 4 | 02151 | 5 |
| 5 | 02124 | 4 |
| 6 | 02118 | 4 |
| 7 | 02126 | 3 |
| 8 | 02130 | 3 |
| 9 | 02150 | 3 |
| 10 | 02171 | 3 |
| 11 | 02143 | 3 |
| 12 | 02129 | 3 |
| 13 | 02135 | 2 |
| 14 | 02215 | 2 |
| 15 | 02127 | 2 |
| 16 | 02122 | 2 |
| 17 | 02169 | 2 |
| 18 | 02113 | 2 |
| 19 | 02134 | 2 |
| 20 | 02145 | 2 |
| 21 | 02114 | 1 |
| 22 | 02141 | 1 |
| 23 | 02148 | 1 |
| 24 | 02128 | 1 |
| 25 | 02140 | 1 |
| 26 | 02210 | 1 |
| 27 | 02138 | 1 |
| 28 | 02120 | 1 |
| 29 | 02116 | 1 |
| 30 | 02119 | 1 |
| 31 | 02121 | 1 |
| 32 | 02131 | 1 |
| 33 | 02186 | 1 |
# read_n_clip Hospitals shapefile from MassGIS and view info.
hospitals = read_n_clip('./data/acute_care_hospitals/HOSPITALS_PT.shp', rt_zcta)
hospitals.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 20 entries, 0 to 68 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 IDNUMBER 20 non-null object 1 DPHID 20 non-null int64 2 NAME 20 non-null object 3 SHORTNAME 20 non-null object 4 ADDRESS 20 non-null object 5 TOWN 20 non-null object 6 GEOG_TOWN 20 non-null object 7 ZIPCODE 20 non-null object 8 CHIAREGION 20 non-null object 9 TELEPHONE 20 non-null object 10 COHORT 20 non-null object 11 HOSPSYSTEM 20 non-null object 12 TAXSTATUS 20 non-null object 13 BEDCOUNT 20 non-null int64 14 ER_STATUS 20 non-null object 15 TRAUMA_ADU 5 non-null object 16 TRAUMA_PED 4 non-null object 17 SPEPUBFUND 18 non-null object 18 FYE 16 non-null object 19 MADID 20 non-null int64 20 EMSREGION 20 non-null int64 21 geometry 20 non-null geometry dtypes: geometry(1), int64(4), object(17) memory usage: 3.6+ KB
# View first five rows of hospitals.
hospitals.head()
| IDNUMBER | DPHID | NAME | SHORTNAME | ADDRESS | TOWN | GEOG_TOWN | ZIPCODE | CHIAREGION | TELEPHONE | ... | TAXSTATUS | BEDCOUNT | ER_STATUS | TRAUMA_ADU | TRAUMA_PED | SPEPUBFUND | FYE | MADID | EMSREGION | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2069 | 2069 | Beth Israel Deaconess Medical Center East | BIDMC East | 330 Brookline Avenue | Boston | BOSTON | 02215 | Metro Boston | (617) 667-7000 | ... | Non-profit | 248 | N | None | None | Not Applicable | 2017-09-30 | 35022096 | 4 | POINT (232494.091 898890.122) |
| 1 | 2085 | 2085 | Steward St. Elizabeth's Medical Center | St. Elizabeth's Medical Center | 736 Cambridge Street | Brighton | BOSTON | 02135 | Metro Boston | (617) 789-3000 | ... | For profit | 252 | Y | None | None | Not Applicable | 2017-12-31 | 35156980 | 4 | POINT (229003.958 899958.291) |
| 2 | 2299 | 2299 | Tufts Medical Center | Tufts Medical Center | 800 Washington Street | Boston | BOSTON | 02111 | Metro Boston | (617) 636-5000 | ... | Non-profit | 415 | Y | Level 1 | Level 1 | Not Applicable | 2017-09-30 | 35145764 | 4 | POINT (235963.109 899998.149) |
| 4 | 2167 | 2167 | Massachusetts Eye and Ear Infirmary | Massachusetts Eye and Ear Infirmary | 243 Charles Street | Boston | BOSTON | 02114 | Metro Boston | (617) 523-7900 | ... | Non-profit | 41 | Y | None | None | Not Applicable | 2017-09-30 | 35163139 | 4 | POINT (235396.614 901452.905) |
| 5 | 2316 | 2316 | Shriners Hospital For Children - Boston | Shriners Boston | 51 Blossom Street | Boston | BOSTON | 02114 | Metro Boston | (617) 722-3000 | ... | Non-profit | 30 | N | None | None | Not Applicable | 2017-12-31 | 35016066 | 4 | POINT (235713.701 901489.419) |
5 rows × 22 columns
# Use count_records on hospitals.
zcta_hospitals_count = count_records(hospitals, rt_zcta, 'ZCTA5CE00', 'hospitals_count')
zcta_hospitals_count
| ZCTA5CE00 | hospitals_count | |
|---|---|---|
| 0 | 02114 | 3 |
| 1 | 02115 | 3 |
| 2 | 02215 | 2 |
| 3 | 02118 | 2 |
| 4 | 02130 | 1 |
| 5 | 02124 | 1 |
| 6 | 02149 | 1 |
| 7 | 02120 | 1 |
| 8 | 02155 | 1 |
| 9 | 02135 | 1 |
| 10 | 02186 | 1 |
| 11 | 02139 | 1 |
| 12 | 02138 | 1 |
| 13 | 02111 | 1 |
# Retrieve healthcare features within graph_extent_latlong from OSMnx and view info.
health_tags = {'healthcare':True, 'amenity':['clinic', 'doctors', 'dentist', 'health_post', 'pharmacy']}
healthcare = ox.geometries_from_polygon(graph_extent_latlong, health_tags)
healthcare = convert_n_clip(healthcare, rt_zcta)
healthcare.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 306 entries, 0 to 356 Data columns (total 100 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 unique_id 306 non-null object 1 osmid 306 non-null int64 2 element_type 306 non-null object 3 attribution 1 non-null object 4 source 11 non-null object 5 geometry 306 non-null geometry 6 addr:city 164 non-null object 7 addr:housenumber 211 non-null object 8 addr:state 123 non-null object 9 addr:street 212 non-null object 10 amenity 262 non-null object 11 healthcare 215 non-null object 12 name 298 non-null object 13 addr:postcode 143 non-null object 14 emergency 28 non-null object 15 emergency_room 10 non-null object 16 healthcare:speciality 56 non-null object 17 opening_hours 76 non-null object 18 operator 54 non-null object 19 phone 119 non-null object 20 website 119 non-null object 21 wikidata 27 non-null object 22 wikipedia 26 non-null object 23 gnis:feature_id 24 non-null object 24 short_name 43 non-null object 25 ele 4 non-null object 26 gnis:county_name 3 non-null object 27 gnis:import_uuid 3 non-null object 28 gnis:reviewed 3 non-null object 29 brand 60 non-null object 30 brand:wikidata 56 non-null object 31 brand:wikipedia 56 non-null object 32 dispensing 34 non-null object 33 drive_through 16 non-null object 34 wheelchair 25 non-null object 35 amenity_1 3 non-null object 36 shop 3 non-null object 37 payment:apple_pay 1 non-null object 38 payment:cash 4 non-null object 39 payment:mastercard 4 non-null object 40 payment:snap 1 non-null object 41 payment:visa 5 non-null object 42 payment:wic 1 non-null object 43 self_checkout 2 non-null object 44 tty 2 non-null object 45 urgent_care 4 non-null object 46 addr:housename 5 non-null object 47 contact:phone 2 non-null object 48 contact:website 2 non-null object 49 branch 1 non-null object 50 note 3 non-null object 51 opening_hours:covid19 8 non-null object 52 start_date 5 non-null object 53 payment:discover_card 3 non-null object 54 addr:unit 22 non-null object 55 atm 1 non-null object 56 description 7 non-null object 57 contact:fax 1 non-null object 58 ref 2 non-null object 59 opening_hours:signed 1 non-null object 60 email 31 non-null object 61 name:en 3 non-null object 62 operator:type 4 non-null object 63 addr:country 6 non-null object 64 level 18 non-null object 65 contact:facebook 6 non-null object 66 payment:american_express 1 non-null object 67 payment:cheque 2 non-null object 68 fax 14 non-null object 69 fixme 1 non-null object 70 healthcare:counselling 1 non-null object 71 payment:credit_cards 1 non-null object 72 contact:instagram 4 non-null object 73 office 1 non-null object 74 contact:github 1 non-null object 75 contact:twitter 1 non-null object 76 addr:floor 1 non-null object 77 payment:paypal 1 non-null object 78 nodes 99 non-null object 79 building 67 non-null object 80 building:levels 11 non-null object 81 massgis:id 17 non-null object 82 roof:levels 1 non-null object 83 roof:shape 3 non-null object 84 gnis:county_id 1 non-null object 85 gnis:created 1 non-null object 86 gnis:state_id 1 non-null object 87 alt_name 3 non-null object 88 landuse 1 non-null object 89 campusbuilding 1 non-null object 90 disused:amenity 2 non-null object 91 old_name 5 non-null object 92 internet_access 1 non-null object 93 beds 6 non-null object 94 height 1 non-null object 95 access 1 non-null object 96 ways 4 non-null object 97 building:material 1 non-null object 98 roof:material 1 non-null object 99 type 4 non-null object dtypes: geometry(1), int64(1), object(98) memory usage: 241.5+ KB
# View first five rows of healthcare GDF.
healthcare.head()
| unique_id | osmid | element_type | attribution | source | geometry | addr:city | addr:housenumber | addr:state | addr:street | ... | disused:amenity | old_name | internet_access | beds | height | access | ways | building:material | roof:material | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | node/61348061 | 61348061 | node | Office of Geographic and Environmental Informa... | massgis_import_v0.1_20071008193615 | POINT (228225.230 899815.766) | Brighton | 438 | MA | Washington Street | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | node/257489648 | 257489648 | node | NaN | NaN | POINT (235923.447 900015.140) | Boston | 755 | NaN | Washington Street | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | node/358278209 | 358278209 | node | NaN | NaN | POINT (233490.267 891124.319) | Mattapan | 1575 | NaN | Blue Hill Avenue | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | node/367777794 | 367777794 | node | NaN | USGS Geonames | POINT (232386.086 898796.067) | NaN | NaN | MA | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | node/391429098 | 391429098 | node | NaN | NaN | POINT (228374.893 904620.489) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 100 columns
# Use count_records on healthcare.
zcta_healthcare_count = count_records(healthcare, rt_zcta, 'ZCTA5CE00', 'healthcare_count')
zcta_healthcare_count
| ZCTA5CE00 | healthcare_count | |
|---|---|---|
| 0 | 02116 | 35 |
| 1 | 02130 | 19 |
| 2 | 02115 | 18 |
| 3 | 02135 | 13 |
| 4 | 02143 | 12 |
| 5 | 02138 | 12 |
| 6 | 02139 | 12 |
| 7 | 02215 | 11 |
| 8 | 02155 | 10 |
| 9 | 02114 | 9 |
| 10 | 02140 | 8 |
| 11 | 02128 | 7 |
| 12 | 02144 | 7 |
| 13 | 02445 | 7 |
| 14 | 02148 | 7 |
| 15 | 02145 | 7 |
| 16 | 02446 | 7 |
| 17 | 02141 | 6 |
| 18 | 02169 | 6 |
| 19 | 02474 | 6 |
| 20 | 02184 | 6 |
| 21 | 02134 | 5 |
| 22 | 02119 | 5 |
| 23 | 02132 | 5 |
| 24 | 02118 | 4 |
| 25 | 02111 | 4 |
| 26 | 02124 | 4 |
| 27 | 02478 | 4 |
| 28 | 02120 | 4 |
| 29 | 02150 | 3 |
| 30 | 02108 | 3 |
| 31 | 02126 | 3 |
| 32 | 02129 | 3 |
| 33 | 02472 | 3 |
| 34 | 02136 | 3 |
| 35 | 02459 | 3 |
| 36 | 02142 | 3 |
| 37 | 02461 | 2 |
| 38 | 02466 | 2 |
| 39 | 02171 | 2 |
| 40 | 02109 | 2 |
| 41 | 02127 | 2 |
| 42 | 02149 | 2 |
| 43 | 02113 | 2 |
| 44 | 02121 | 1 |
| 45 | 02210 | 1 |
| 46 | 02465 | 1 |
| 47 | 02151 | 1 |
| 48 | 02199 | 1 |
# Use calc_density to calculate density of health services per ZCTA.
health_gdfs = [zcta_comm_health_count, zcta_hospitals_count, zcta_healthcare_count]
health_cols = ['comm_health_count', 'hospitals_count', 'healthcare_count']
zcta_health = calc_density(rt_zcta, health_gdfs, ['ZCTA5CE00'], 'left', health_cols, 'health_count', 'health_density')
# View five most health-dense ZCTAs.
zcta_health.sort_values(by='health_density', ascending=False).head()
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | comm_health_count | hospitals_count | healthcare_count | health_count | health_density | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 53 | 435 | 25 | 02116 | 2502116 | B5 | G6350 | S | 1712064 | 222385 | +42.3499907 | -071.0760802 | N | POLYGON ((234702.966 899809.334, 234675.900 89... | 1.0 | 0.0 | 35.0 | 36.0 | 18.611098 |
| 55 | 437 | 25 | 02113 | 2502113 | B5 | G6350 | S | 272995 | 0 | +42.3651696 | -071.0553627 | N | POLYGON ((236403.940 901958.860, 236443.429 90... | 2.0 | 0.0 | 2.0 | 4.0 | 14.653358 |
| 59 | 493 | 25 | 02111 | 2502111 | B5 | G6350 | S | 754214 | 79209 | +42.3487843 | -071.0589880 | N | POLYGON ((236144.915 900687.565, 236153.840 90... | 5.0 | 1.0 | 4.0 | 10.0 | 13.180128 |
| 17 | 115 | 25 | 02115 | 2502115 | B5 | G6350 | S | 1979415 | 162098 | +42.3375449 | -071.1061732 | N | POLYGON ((233785.591 900215.963, 233773.509 90... | 0.0 | 3.0 | 18.0 | 21.0 | 9.806794 |
| 50 | 379 | 25 | 02114 | 2502114 | B5 | G6350 | S | 1065352 | 343761 | +42.3631745 | -071.0686463 | N | MULTIPOLYGON (((234691.735 900849.190, 234663.... | 1.0 | 3.0 | 9.0 | 13.0 | 9.662174 |
# View stats for health_density
zcta_health.health_density.describe()
count 62.000000 mean 2.362879 std 3.733892 min 0.000000 25% 0.278843 50% 0.828034 75% 2.455580 max 18.611098 Name: health_density, dtype: float64
# View histogram for distribution of health_density.
sns.histplot(data=zcta_health, x='health_density')
plt.title('Distribution of Health Services Density per ZCTA in Boston Area')
plt.show()
# Map the density of health services in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_health.plot(column='health_density',
legend=True,
edgecolor='black',
cmap='YlGnBu',
legend_kwds={'label': "Health services per sqkm"},
ax=ax)
tufts_bu.plot(ax=ax, color='maroon', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='maroon', alpha=0.5, label='Shortest Paths')
plt.title('Density of Health Services by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# Calculate values to separate health density into five quantiles.
health_quints = quantiles(zcta_health, 'health_density', quintiles)
health_quints
[0.1986800423319442, 0.6037528931486905, 1.3204051899721023, 3.7225941851306605]
# Reclassify health density with reclass_5 function and quintile values.
zcta_health['health_reclass'] = zcta_health['health_density'].apply(lambda x: reclass_5(x, health_quints, 'high'))
# View top and bottom five health-service-dense ZCTAs.
zcta_health.sort_values(by='health_reclass', ascending=False)
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | comm_health_count | hospitals_count | healthcare_count | health_count | health_density | health_reclass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 55 | 437 | 25 | 02113 | 2502113 | B5 | G6350 | S | 272995 | 0 | +42.3651696 | -071.0553627 | N | POLYGON ((236403.940 901958.860, 236443.429 90... | 2.0 | 0.0 | 2.0 | 4.0 | 14.653358 | 5 |
| 57 | 482 | 25 | 02139 | 2502139 | B5 | G6350 | S | 3784723 | 368360 | +42.3622881 | -071.1037905 | N | POLYGON ((232130.866 902283.249, 232144.254 90... | 7.0 | 1.0 | 12.0 | 20.0 | 4.816014 | 5 |
| 56 | 438 | 25 | 02108 | 2502108 | B5 | G6350 | S | 721093 | 0 | +42.3575539 | -071.0639133 | N | POLYGON ((235436.185 900908.999, 235400.476 90... | 0.0 | 0.0 | 3.0 | 3.0 | 4.160645 | 5 |
| 59 | 493 | 25 | 02111 | 2502111 | B5 | G6350 | S | 754214 | 79209 | +42.3487843 | -071.0589880 | N | POLYGON ((236144.915 900687.565, 236153.840 90... | 5.0 | 1.0 | 4.0 | 10.0 | 13.180128 | 5 |
| 50 | 379 | 25 | 02114 | 2502114 | B5 | G6350 | S | 1065352 | 343761 | +42.3631745 | -071.0686463 | N | MULTIPOLYGON (((234691.735 900849.190, 234663.... | 1.0 | 3.0 | 9.0 | 13.0 | 9.662174 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 22 | 176 | 25 | 02163 | 2502163 | B5 | G6350 | S | 316665 | 32165 | +42.3661684 | -071.1228503 | N | POLYGON ((231019.106 902090.531, 231035.630 90... | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 1 |
| 21 | 171 | 25 | 02184 | 2502184 | B5 | G6350 | S | 35903196 | 2081541 | +42.2054158 | -071.0021680 | N | MULTIPOLYGON (((237408.847 882805.326, 237292.... | 0.0 | 0.0 | 6.0 | 6.0 | 0.159571 | 1 |
| 15 | 103 | 25 | 02170 | 2502170 | B5 | G6350 | S | 5031241 | 25238 | +42.2654801 | -071.0171436 | N | MULTIPOLYGON (((238632.563 889957.081, 238619.... | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 1 |
| 4 | 39 | 25 | 02464 | 2502464 | B5 | G6350 | S | 1447706 | 29494 | +42.3129751 | -071.2188818 | N | POLYGON ((223497.366 896100.959, 223538.895 89... | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 1 |
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 1 |
62 rows × 19 columns
# Map the average rent price in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_health.plot(column='health_reclass',
legend=True,
edgecolor='black',
cmap='plasma_r',
legend_kwds={'label': "Reclassified health density"},
ax=ax)
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
plt.title('Reclassified Density of Health Services by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# View ZCTAs with the highest health density (over 3.7 establishments/sqkm).
zcta_health.loc[zcta_health.health_reclass==5]['ZCTA5CE00']
1 02141 2 02143 17 02115 33 02109 40 02215 42 02199 50 02114 51 02120 53 02116 55 02113 56 02108 57 02139 59 02111 Name: ZCTA5CE00, dtype: object
This reclassified health density map looks very similar to the reclassified food density map. Both food and health are most dense in the central part of the Boston area.
Public services were defined in terms of safety (fire and police) and availability of public resources (USPS Post Offices and libraries). Fire stations, police stations, and libraries were obtained from MassGIS. USPS data was obtained from USPS as a CSV of Destination Delivery Units (DDUs) in a specified map extent.
# read_n_clip firestations shapefile from MassGIS and view info.
fire = read_n_clip('./data/firestations_pt/FIRESTATIONS_PT_MEMA.shp', rt_zcta)
fire.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 103 entries, 5 to 710 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 NAME 103 non-null object 1 OFFICE 103 non-null object 2 ADDRESS 103 non-null object 3 CITY 103 non-null object 4 STATE 103 non-null object 5 ZIP 103 non-null object 6 L_SRC 103 non-null object 7 SOURCE 103 non-null object 8 L_DATE 103 non-null object 9 geometry 103 non-null geometry dtypes: geometry(1), object(9) memory usage: 8.9+ KB
# View first five rows of fire GDF.
fire.head()
| NAME | OFFICE | ADDRESS | CITY | STATE | ZIP | L_SRC | SOURCE | L_DATE | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|
| 5 | MILTON FIRE DEPARTMENT | DEPT E 4 | 815 Blue Hill Ave. | MILTON | MA | 01286 | MAPC | MEMA | 2006-08-16 | POINT (232280.004 887730.431) |
| 6 | MILTON FIRE DEPARTMENT | Station 1- Headquarters | 515 Canton Avenue | MILTON | MA | 02186 | MAPC | MEMA | 2006-08-16 | POINT (234642.734 889410.311) |
| 7 | MILTON FIRE DEPARTMENT | Station E 2 | 525 Adams Street | MILTON | MA | 02186 | MAPC | MEMA | 2006-08-16 | POINT (237885.224 889826.071) |
| 38 | REVERE FIRE DEPARTMENT | Engine 1 | 13 Walden Street | REVERE | MA | 02151 | MAPC | MEMA | 2006-08-16 | POINT (241572.264 906537.411) |
| 39 | REVERE FIRE DEPARTMENT | Engine 3 | 929 Winthrop Avenue | REVERE | MA | 02151 | MAPC | MEMA | 2006-08-16 | POINT (242387.724 904981.381) |
# Use count_records on fire.
zcta_fire_count = count_records(fire, rt_zcta, 'ZCTA5CE00', 'fire_count')
zcta_fire_count
| ZCTA5CE00 | fire_count | |
|---|---|---|
| 0 | 02155 | 6 |
| 1 | 02169 | 5 |
| 2 | 02151 | 5 |
| 3 | 02445 | 4 |
| 4 | 02472 | 3 |
| 5 | 02139 | 3 |
| 6 | 02186 | 3 |
| 7 | 02118 | 3 |
| 8 | 02124 | 3 |
| 9 | 02149 | 3 |
| 10 | 02459 | 3 |
| 11 | 02128 | 3 |
| 12 | 02184 | 3 |
| 13 | 02150 | 3 |
| 14 | 02143 | 3 |
| 15 | 02148 | 3 |
| 16 | 02138 | 3 |
| 17 | 02132 | 2 |
| 18 | 02127 | 2 |
| 19 | 02135 | 2 |
| 20 | 02122 | 2 |
| 21 | 02119 | 2 |
| 22 | 02136 | 2 |
| 23 | 02446 | 2 |
| 24 | 02129 | 2 |
| 25 | 02115 | 2 |
| 26 | 02171 | 2 |
| 27 | 02478 | 2 |
| 28 | 02111 | 2 |
| 29 | 02114 | 1 |
| 30 | 02131 | 1 |
| 31 | 02144 | 1 |
| 32 | 02170 | 1 |
| 33 | 02125 | 1 |
| 34 | 02116 | 1 |
| 35 | 02460 | 1 |
| 36 | 02474 | 1 |
| 37 | 02113 | 1 |
| 38 | 02121 | 1 |
| 39 | 02140 | 1 |
| 40 | 02145 | 1 |
| 41 | 02134 | 1 |
| 42 | 02458 | 1 |
| 43 | 02109 | 1 |
| 44 | 02465 | 1 |
| 45 | 02464 | 1 |
| 46 | 02141 | 1 |
| 47 | 02110 | 1 |
| 48 | 02130 | 1 |
# read_n_clip police stations shapefile from MassGIS and view info.
police = read_n_clip('./data/policestations/POLICESTATIONS_PT_MEMA.shp', rt_zcta)
police.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 42 entries, 6 to 412 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 NAME 42 non-null object 1 ADDRESS 42 non-null object 2 CITY 42 non-null object 3 STATE 42 non-null object 4 ZIP 42 non-null object 5 JURISDIC 42 non-null object 6 L_SRC 41 non-null object 7 SOURCE 42 non-null object 8 L_DATE 42 non-null object 9 geometry 42 non-null geometry dtypes: geometry(1), object(9) memory usage: 3.6+ KB
# View first five rows of police GDF.
police.head()
| NAME | ADDRESS | CITY | STATE | ZIP | JURISDIC | L_SRC | SOURCE | L_DATE | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|
| 6 | STATE POLICE STATION H-7, MILTON | 685 HILLSIDE STREET | MILTON | MA | 02186 | STATE | MAPC | MEMA | 2006-09-12 | POINT (233598.434 885007.271) |
| 42 | BOSTON POLICE DEPARTMENT DISTRICT E-5 | 1708 CENTRE STREET | BOSTON | MA | 02132 | LOCAL | MEMA | CITY | 2006-09-12 | POINT (229001.867 892986.812) |
| 43 | BOSTON POLICE DEPARTMENT DISTRICTS A-1 AND A-15 | 40 NEW SUDBURY STREET | BOSTON | MA | 02114 | LOCAL | MEMA | CITY | 2006-09-12 | POINT (236217.474 901349.051) |
| 44 | BOSTON POLICE DEPARTMENT DISTRICT B-2 | 2400 WASHINGTON STREET | BOSTON | MA | 02119 | LOCAL | MEMA | CITY | 2014-03-01 | POINT (234279.794 897624.521) |
| 45 | BOSTON POLICE DEPARTMENT DISTRICT C-11 | 40 GIBSON STREET | BOSTON | MA | 02122 | LOCAL | MEMA | CITY | 2006-09-12 | POINT (236357.494 894265.901) |
# Use count_records on police.
zcta_police_count = count_records(police, rt_zcta, 'ZCTA5CE00', 'police_count')
zcta_police_count
| ZCTA5CE00 | police_count | |
|---|---|---|
| 0 | 02118 | 3 |
| 1 | 02155 | 3 |
| 2 | 02210 | 3 |
| 3 | 02186 | 2 |
| 4 | 02128 | 2 |
| 5 | 02465 | 2 |
| 6 | 02151 | 2 |
| 7 | 02135 | 2 |
| 8 | 02114 | 2 |
| 9 | 02132 | 1 |
| 10 | 02130 | 1 |
| 11 | 02126 | 1 |
| 12 | 02122 | 1 |
| 13 | 02120 | 1 |
| 14 | 02141 | 1 |
| 15 | 02184 | 1 |
| 16 | 02125 | 1 |
| 17 | 02119 | 1 |
| 18 | 02149 | 1 |
| 19 | 02474 | 1 |
| 20 | 02150 | 1 |
| 21 | 02136 | 1 |
| 22 | 02472 | 1 |
| 23 | 02445 | 1 |
| 24 | 02108 | 1 |
| 25 | 02127 | 1 |
| 26 | 02478 | 1 |
| 27 | 02143 | 1 |
| 28 | 02148 | 1 |
| 29 | 02169 | 1 |
# Read in CSV files of USPS DDUs (Destination Delivery Units) obtained from USPS.
usps_df = pd.read_csv('./data/USPS DDUs.csv')
# Convert to GeoDataFrame, setting CRS to WGS 84 Pseudo Mercator EPSG:3857.
usps = gpd.GeoDataFrame(usps_df, geometry=gpd.points_from_xy(usps_df.x, usps_df.y))
usps = usps.set_crs('epsg:3857')
# View usps GDF.
usps
| AREA_ID | AREA_NAME | DISTRICT_ID | DISTRICT_NAME | LOCALE_KEY | LOCALE_NAME | FACILITY_TYPE | ADDRESS | CITY | STATE | ... | DISTRICTID | STFIPS | STATE_ABBR | CDFIPS | NAME | LAST_NAME | PARTY | x | y | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V27697 | BOYLSTON | POST_OFF | 67 SHREWSBURY ST | BOYLSTON | MA | ... | 2502.0 | 25.0 | MA | 2.0 | James P. McGovern | McGovern | Democrat | -7.985977e+06 | 5.211252e+06 | POINT (-7985976.557 5211251.950) |
| 1 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V21844 | GREENDALE | POST_OFF | 290 W BOYLSTON ST | WORCESTER | MA | ... | 2502.0 | 25.0 | MA | 2.0 | James P. McGovern | McGovern | Democrat | -7.992721e+06 | 5.206060e+06 | POINT (-7992720.737 5206060.432) |
| 2 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V21830 | GRAFTON | POST_OFF | 23 UPTON ST | GRAFTON | MA | ... | 2502.0 | 25.0 | MA | 2.0 | James P. McGovern | McGovern | Democrat | -7.979694e+06 | 5.191994e+06 | POINT (-7979693.796 5191993.618) |
| 3 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V21975 | LEOMINSTER | POST_OFF | 68 MAIN ST | LEOMINSTER | MA | ... | 2502.0 | 25.0 | MA | 2.0 | James P. McGovern | McGovern | Democrat | -7.988160e+06 | 5.240422e+06 | POINT (-7988159.533 5240422.163) |
| 4 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V27333 | MIDTOWN MALL | POST_OFF | 22 FRONT ST | WORCESTER | MA | ... | 2502.0 | 25.0 | MA | 2.0 | James P. McGovern | McGovern | Democrat | -7.992883e+06 | 5.200502e+06 | POINT (-7992883.375 5200502.471) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 234 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | 001823 | WEYMOUTH PHANTOM | NON_FAC | 103 WASHINGTON ST | WEYMOUTH | MA | ... | 2508.0 | 25.0 | MA | 8.0 | Stephen F. Lynch | Lynch | Democrat | -7.899976e+06 | 5.193719e+06 | POINT (-7899975.794 5193718.888) |
| 235 | 4B | ATLANTIC | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -7.915456e+06 | 5.213226e+06 | POINT (-7915455.605 5213226.381) |
| 236 | 4B | ATLANTIC | 22.0 | GREATER BOSTON | 019298 | KENMORE NEW | POST OFFICE | 512 PARK DR | BOSTON | MA | ... | NaN | NaN | MA | NaN | NaN | NaN | NaN | -7.915455e+06 | 5.213226e+06 | POINT (-7915455.273 5213225.551) |
| 237 | 4B | ATLANTIC | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -7.912334e+06 | 5.216294e+06 | POINT (-7912333.878 5216294.184) |
| 238 | 4B | ATLANTIC | 20.0 | GREATER BOSTON | 019091 | KENDALL CAMBRIDGESIDE | POST OFFICE | 91 1ST ST | CAMBRIDGE | MA | ... | NaN | NaN | MA | NaN | NaN | NaN | NaN | -7.912327e+06 | 5.216293e+06 | POINT (-7912327.309 5216293.188) |
239 rows × 27 columns
# convert_n_clip usps using rt_zcta.
usps = convert_n_clip(usps, rt_zcta)
usps.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 101 entries, 44 to 238 Data columns (total 27 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AREA_ID 101 non-null object 1 AREA_NAME 101 non-null object 2 DISTRICT_ID 99 non-null float64 3 DISTRICT_NAME 99 non-null object 4 LOCALE_KEY 99 non-null object 5 LOCALE_NAME 99 non-null object 6 FACILITY_TYPE 99 non-null object 7 ADDRESS 99 non-null object 8 CITY 99 non-null object 9 STATE 99 non-null object 10 ZIP_CODE 99 non-null float64 11 PLUS4_CODE 99 non-null float64 12 FAC_FACILITY_SUBTYPE 99 non-null object 13 LOC_TYPE 97 non-null object 14 Status 101 non-null object 15 Comments 5 non-null object 16 EditDate 99 non-null object 17 DISTRICTID 97 non-null float64 18 STFIPS 97 non-null float64 19 STATE_ABBR 99 non-null object 20 CDFIPS 97 non-null float64 21 NAME 97 non-null object 22 LAST_NAME 97 non-null object 23 PARTY 97 non-null object 24 x 101 non-null float64 25 y 101 non-null float64 26 geometry 101 non-null geometry dtypes: float64(8), geometry(1), object(18) memory usage: 22.1+ KB
# View first five rows of usps GDF.
usps.head()
| AREA_ID | AREA_NAME | DISTRICT_ID | DISTRICT_NAME | LOCALE_KEY | LOCALE_NAME | FACILITY_TYPE | ADDRESS | CITY | STATE | ... | DISTRICTID | STFIPS | STATE_ABBR | CDFIPS | NAME | LAST_NAME | PARTY | x | y | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 44 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V22169 | NEWTON CENTER CARRIER ANNEX | POST_OFF | 211 SUMNER ST | NEWTON CENTER | MA | ... | 2504.0 | 25.0 | MA | 4.0 | Joseph P. Kennedy III | Kennedy | Democrat | -7.925125e+06 | 5.210698e+06 | POINT (225334.181 897885.006) |
| 46 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | 000308 | NEWTON CENTER | POST_OFF | 716 BEACON ST | NEWTON CENTER | MA | ... | 2504.0 | 25.0 | MA | 4.0 | Joseph P. Kennedy III | Kennedy | Democrat | -7.924838e+06 | 5.210626e+06 | POINT (225547.173 897832.905) |
| 47 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V21442 | AUBURNDALE | POST_OFF | 2122 COMMONWEALTH AVE | AUBURNDALE | MA | ... | 2504.0 | 25.0 | MA | 4.0 | Joseph P. Kennedy III | Kennedy | Democrat | -7.931349e+06 | 5.213212e+06 | POINT (220720.874 899723.988) |
| 49 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V21511 | BOSTON COLLEGE | POST_OFF | 136 COMMONWEALTH AVE | CHESTNUT HILL | MA | ... | 2504.0 | 25.0 | MA | 4.0 | Joseph P. Kennedy III | Kennedy | Democrat | -7.922825e+06 | 5.211033e+06 | POINT (227036.290 898138.159) |
| 50 | 4B | NORTHEAST (B) | 20.0 | GREATER BOSTON | V21558 | BROOKLINE | POST_OFF | 1295 BEACON ST | BROOKLINE | MA | ... | 2504.0 | 25.0 | MA | 4.0 | Joseph P. Kennedy III | Kennedy | Democrat | -7.917012e+06 | 5.212370e+06 | POINT (231335.285 899142.486) |
5 rows × 27 columns
# Use count_records on usps.
zcta_usps_count = count_records(usps, rt_zcta, 'ZCTA5CE00', 'usps_count')
zcta_usps_count
| ZCTA5CE00 | usps_count | |
|---|---|---|
| 0 | 02215 | 4 |
| 1 | 02155 | 4 |
| 2 | 02151 | 4 |
| 3 | 02459 | 3 |
| 4 | 02141 | 3 |
| 5 | 02150 | 3 |
| 6 | 02111 | 3 |
| 7 | 02139 | 3 |
| 8 | 02186 | 3 |
| 9 | 02472 | 3 |
| 10 | 02478 | 3 |
| 11 | 02136 | 3 |
| 12 | 02184 | 3 |
| 13 | 02474 | 3 |
| 14 | 02446 | 2 |
| 15 | 02138 | 2 |
| 16 | 02170 | 2 |
| 17 | 02114 | 2 |
| 18 | 02122 | 2 |
| 19 | 02149 | 2 |
| 20 | 02169 | 2 |
| 21 | 02467 | 2 |
| 22 | 02124 | 2 |
| 23 | 02120 | 2 |
| 24 | 02148 | 2 |
| 25 | 02143 | 2 |
| 26 | 02458 | 1 |
| 27 | 02145 | 1 |
| 28 | 02132 | 1 |
| 29 | 02144 | 1 |
| 30 | 02126 | 1 |
| 31 | 02128 | 1 |
| 32 | 02466 | 1 |
| 33 | 02115 | 1 |
| 34 | 02461 | 1 |
| 35 | 02116 | 1 |
| 36 | 02118 | 1 |
| 37 | 02445 | 1 |
| 38 | 02130 | 1 |
| 39 | 02121 | 1 |
| 40 | 02464 | 1 |
| 41 | 02468 | 1 |
| 42 | 02171 | 1 |
| 43 | 02131 | 1 |
| 44 | 02163 | 1 |
| 45 | 02134 | 1 |
| 46 | 02460 | 1 |
| 47 | 02113 | 1 |
| 48 | 02127 | 1 |
| 49 | 02199 | 1 |
| 50 | 02140 | 1 |
| 51 | 02108 | 1 |
| 52 | 02125 | 1 |
| 53 | 02142 | 1 |
| 54 | 02129 | 1 |
| 55 | 02465 | 1 |
| 56 | 02109 | 1 |
| 57 | 02135 | 1 |
# read_n_clip libraries shapefile from MassGIS and view info.
library = read_n_clip('./data/libraries/LIBRARIES_PT.shp', rt_zcta)
library.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 59 entries, 16 to 463 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 NAME 59 non-null object 1 ADDRESS 59 non-null object 2 TOWN 59 non-null object 3 STATE 59 non-null object 4 ZIP 59 non-null object 5 TYPE 59 non-null object 6 ADDRESS_PO 29 non-null object 7 LIBKEY 59 non-null object 8 NETWORK 54 non-null object 9 REGION 59 non-null object 10 DELIVROUTE 54 non-null object 11 WEBSITE 59 non-null object 12 geometry 59 non-null geometry dtypes: geometry(1), object(12) memory usage: 6.5+ KB
# View first five rows of library GDF.
library.head()
| NAME | ADDRESS | TOWN | STATE | ZIP | TYPE | ADDRESS_PO | LIBKEY | NETWORK | REGION | DELIVROUTE | WEBSITE | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16 | MILTON PUBLIC LIBRARY | 476 CANTON AVENUE | MILTON | MA | 02186 | PUBLIC | M_234870_889352 | PU-MILTON-PUBLIC | OCLN | Southeastern | Southeast Region | http://www.miltonlibrary.org | POINT (234870.094 889352.186) |
| 17 | MALDEN PUBLIC LIBRARY | 36 SALEM STREET | MALDEN | MA | 02148 | PUBLIC | M_235699_908691 | PU-MALDEN-PUBLIC | MBLN | Boston | Boston Region | http://www.maldenpubliclibrary.org/ | POINT (235698.507 908691.000) |
| 19 | THE PUBLIC LIBRARY OF BROOKLINE | 361 WASHINGTON STREET | BROOKLINE | MA | 02445 | PUBLIC | M_231196_898303 | PU-BROOKLIN-PUBLIC | MLN | Metrowest | Metrowest Region | http://www.brooklinelibrary.com | POINT (231196.352 898303.087) |
| 20 | CHELSEA PUBLIC LIBRARY | 569 BROADWAY | CHELSEA | MA | 02150 | PUBLIC | M_238532_904892 | PU-CHELSEA-PUBLIC | MBLN | Boston | Boston Region | http://www.ci.chelsea.ma.us/Public_Documents/C... | POINT (238531.652 904891.760) |
| 24 | REVERE PUBLIC LIBRARY | 179 BEACH STREET | REVERE | MA | 02151 | PUBLIC | M_240403_906504 | PU-REVERE-PUBLIC | NOBLE | Northeast | Northeast Region | http://www.reverepubliclibrary.org/ | POINT (240402.902 906504.436) |
# Use count_records on library.
zcta_library_count = count_records(library, rt_zcta, 'ZCTA5CE00', 'library_count')
zcta_library_count
| ZCTA5CE00 | library_count | |
|---|---|---|
| 0 | 02125 | 3 |
| 1 | 02138 | 3 |
| 2 | 02130 | 2 |
| 3 | 02114 | 2 |
| 4 | 02141 | 2 |
| 5 | 02124 | 2 |
| 6 | 02122 | 2 |
| 7 | 02135 | 2 |
| 8 | 02445 | 2 |
| 9 | 02149 | 2 |
| 10 | 02119 | 2 |
| 11 | 02169 | 2 |
| 12 | 02148 | 1 |
| 13 | 02171 | 1 |
| 14 | 02121 | 1 |
| 15 | 02151 | 1 |
| 16 | 02120 | 1 |
| 17 | 02446 | 1 |
| 18 | 02129 | 1 |
| 19 | 02170 | 1 |
| 20 | 02143 | 1 |
| 21 | 02155 | 1 |
| 22 | 02116 | 1 |
| 23 | 02128 | 1 |
| 24 | 02126 | 1 |
| 25 | 02459 | 1 |
| 26 | 02132 | 1 |
| 27 | 02140 | 1 |
| 28 | 02127 | 1 |
| 29 | 02472 | 1 |
| 30 | 02144 | 1 |
| 31 | 02478 | 1 |
| 32 | 02186 | 1 |
| 33 | 02131 | 1 |
| 34 | 02150 | 1 |
| 35 | 02474 | 1 |
| 36 | 02136 | 1 |
| 37 | 02184 | 1 |
| 38 | 02139 | 1 |
| 39 | 02115 | 1 |
| 40 | 02108 | 1 |
| 41 | 02118 | 1 |
| 42 | 02134 | 1 |
| 43 | 02145 | 1 |
| 44 | 02113 | 1 |
# Use calc_density to calculate density of public services per ZCTA.
public_service_gdfs = [zcta_fire_count, zcta_police_count, zcta_usps_count, zcta_library_count]
public_service_cols = ['fire_count', 'police_count', 'usps_count', 'library_count']
zcta_public_service = calc_density(rt_zcta, public_service_gdfs, ['ZCTA5CE00'], 'left', public_service_cols, 'public_service_count', 'public_service_density')
# View five most public-service-dense ZCTAs.
zcta_public_service.sort_values(by='public_service_density', ascending=False).head()
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | fire_count | police_count | usps_count | library_count | public_service_count | public_service_density | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 55 | 437 | 25 | 02113 | 2502113 | B5 | G6350 | S | 272995 | 0 | +42.3651696 | -071.0553627 | N | POLYGON ((236403.940 901958.860, 236443.429 90... | 1.0 | 0.0 | 1.0 | 1.0 | 3.0 | 10.990018 |
| 42 | 308 | 25 | 02199 | 2502199 | B5 | G6350 | S | 149678 | 0 | +42.3474648 | -071.0820577 | N | POLYGON ((234168.090 899801.270, 234292.861 89... | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 6.681462 |
| 59 | 493 | 25 | 02111 | 2502111 | B5 | G6350 | S | 754214 | 79209 | +42.3487843 | -071.0589880 | N | POLYGON ((236144.915 900687.565, 236153.840 90... | 2.0 | 0.0 | 3.0 | 0.0 | 5.0 | 6.590064 |
| 50 | 379 | 25 | 02114 | 2502114 | B5 | G6350 | S | 1065352 | 343761 | +42.3631745 | -071.0686463 | N | MULTIPOLYGON (((234691.735 900849.190, 234663.... | 1.0 | 2.0 | 2.0 | 2.0 | 7.0 | 5.202709 |
| 33 | 253 | 25 | 02109 | 2502109 | B5 | G6350 | S | 427819 | 0 | +42.3626531 | -071.0538044 | N | MULTIPOLYGON (((236437.338 900792.044, 236429.... | 1.0 | 0.0 | 1.0 | 0.0 | 2.0 | 4.746876 |
# View stats for public_service_density
zcta_public_service.public_service_density.describe()
count 62.000000 mean 1.538382 std 1.909284 min 0.000000 25% 0.556606 50% 0.850057 75% 1.577252 max 10.990018 Name: public_service_density, dtype: float64
# View histogram for distribution of public_service_density.
sns.histplot(data=zcta_public_service, x='public_service_density')
plt.title('Distribution of Public Services Density per ZCTA in Boston Area')
plt.show()
# Map the density of public services in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_public_service.plot(column='public_service_density',
legend=True,
edgecolor='black',
cmap='YlGnBu',
legend_kwds={'label': "Public services per sqkm"},
ax=ax)
tufts_bu.plot(ax=ax, color='maroon', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='maroon', alpha=0.5, label='Shortest Paths')
plt.title('Density of Public Services by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# Calculate values to separate public service density into five quantiles.
public_service_quints = quantiles(zcta_public_service, 'public_service_density', quintiles)
public_service_quints
[0.5285904514434618, 0.7484243809647726, 1.00445659307057, 1.7628997469448489]
# Reclassify public service density with reclass_5 function and quintile values.
zcta_public_service['public_service_reclass'] = zcta_public_service['public_service_density'].apply(lambda x: reclass_5(x, public_service_quints, 'high'))
# View top and bottom five health-service-dense ZCTAs.
zcta_public_service.sort_values(by='public_service_reclass', ascending=False)
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | fire_count | police_count | usps_count | library_count | public_service_count | public_service_density | public_service_reclass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 50 | 379 | 25 | 02114 | 2502114 | B5 | G6350 | S | 1065352 | 343761 | +42.3631745 | -071.0686463 | N | MULTIPOLYGON (((234691.735 900849.190, 234663.... | 1.0 | 2.0 | 2.0 | 2.0 | 7.0 | 5.202709 | 5 |
| 22 | 176 | 25 | 02163 | 2502163 | B5 | G6350 | S | 316665 | 32165 | +42.3661684 | -071.1228503 | N | POLYGON ((231019.106 902090.531, 231035.630 90... | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 2.866897 | 5 |
| 51 | 380 | 25 | 02120 | 2502120 | B5 | G6350 | S | 1549983 | 0 | +42.3324242 | -071.0957495 | N | POLYGON ((232960.437 898354.494, 232981.813 89... | 0.0 | 1.0 | 2.0 | 1.0 | 4.0 | 2.580836 | 5 |
| 56 | 438 | 25 | 02108 | 2502108 | B5 | G6350 | S | 721093 | 0 | +42.3575539 | -071.0639133 | N | POLYGON ((235436.185 900908.999, 235400.476 90... | 0.0 | 1.0 | 1.0 | 1.0 | 3.0 | 4.160645 | 5 |
| 59 | 493 | 25 | 02111 | 2502111 | B5 | G6350 | S | 754214 | 79209 | +42.3487843 | -071.0589880 | N | POLYGON ((236144.915 900687.565, 236153.840 90... | 2.0 | 0.0 | 3.0 | 0.0 | 5.0 | 6.590064 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 24 | 226 | 25 | 02468 | 2502468 | B5 | G6350 | S | 4120625 | 57832 | +42.3291748 | -071.2315831 | N | POLYGON ((220628.408 897708.340, 220639.271 89... | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.239338 | 1 |
| 21 | 171 | 25 | 02184 | 2502184 | B5 | G6350 | S | 35903196 | 2081541 | +42.2054158 | -071.0021680 | N | MULTIPOLYGON (((237408.847 882805.326, 237292.... | 3.0 | 1.0 | 3.0 | 1.0 | 8.0 | 0.212762 | 1 |
| 12 | 95 | 25 | 02148 | 2502148 | B5 | G6350 | S | 13153067 | 99285 | +42.4301399 | -071.0576939 | N | MULTIPOLYGON (((238354.542 910494.597, 238359.... | 3.0 | 1.0 | 2.0 | 1.0 | 7.0 | 0.528238 | 1 |
| 7 | 54 | 25 | 02132 | 2502132 | B5 | G6350 | S | 11715729 | 249516 | +42.2809050 | -071.1639646 | N | POLYGON ((228857.652 890423.158, 228862.932 89... | 2.0 | 1.0 | 1.0 | 1.0 | 5.0 | 0.417906 | 1 |
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 1 |
62 rows × 20 columns
# Map reclassified public service density in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_public_service.plot(column='public_service_reclass',
legend=True,
edgecolor='black',
cmap='plasma_r',
legend_kwds={'label': "Reclassified public service density"},
ax=ax)
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
plt.title('Reclassified Density of Public Services by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# View ZCTAs with the highest public service density (over 4.3 establishments/sqkm).
zcta_public_service.loc[zcta_public_service.public_service_reclass==5]['ZCTA5CE00']
1 02141 17 02115 22 02163 33 02109 40 02215 42 02199 48 02110 49 02118 50 02114 51 02120 55 02113 56 02108 59 02111 Name: ZCTA5CE00, dtype: object
Once again, a similar trend is found for public services density as for food and health services density. The central part of the city has the most amenities per sqkm.
While necessities like food, health, and safety are crucial to living anywhere, living requires leisure as well--life cannot be all work and no play! Leisure features were found using OSMnx and analyzed with the same process as other indicators.
Leisure features were defined using OSM tags for leisure, which define leisure features as "places people go in their spare time". Additional tags were included for amenities, such as bars and theaters, and tourist sites, such as aquariums and museums.
# Retrieve leisure features within graph_extent_latlong from OSMnx and view info.
leisure_tags = {'leisure':True,
'amenity':['bar', 'biergarten', 'ice_cream', 'pub', 'bicycle_rental', 'boat_rental', 'car_rental', 'ferry_terminal', 'language_school', 'toy_library', 'music_school', 'arts_centre', 'casino', 'cinema', 'community_centre', 'gambling', 'nightclub', 'planetarium', 'social_centre', 'theatre', 'bbq'],
'tourism':['aquarium', 'artwork', 'attraction', 'gallery', 'museum', 'theme_park', 'viewpoint', 'zoo'],
'shop':True,
'sport':True}
leisure = ox.geometries_from_polygon(graph_extent_latlong, leisure_tags)
leisure = convert_n_clip(leisure, rt_zcta)
leisure.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 7955 entries, 1 to 9568 Columns: 532 entries, unique_id to lanes:sprint dtypes: geometry(1), int64(1), object(530) memory usage: 32.3+ MB
# View first five rows of leisure GDF.
leisure.head()
| unique_id | osmid | element_type | name | tourism | geometry | highway | attribution | leisure | source | ... | min_level | garden:style | storage | storage_area | theatre | heating | ways | history | operation | lanes:sprint | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | node/61341459 | 61341459 | node | NaN | NaN | POINT (234531.223 895737.946) | NaN | Office of Geographic and Environmental Informa... | yes | massgis_import_v0.1_20071008193615 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | node/66437893 | 66437893 | node | NaN | viewpoint | POINT (228979.607 908978.068) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | node/67422140 | 67422140 | node | Heartbreak Hill | attraction | POINT (226487.979 898452.197) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | node/73105560 | 73105560 | node | NaN | viewpoint | POINT (238002.707 894596.205) | NaN | Office of Geographic and Environmental Informa... | NaN | massgis_import_v0.1_20071009092358 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | node/246650672 | 246650672 | node | Harris Cyclery | NaN | POINT (222365.707 899867.732) | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 532 columns
# Use count_records on leisure.
zcta_leisure_count = count_records(leisure, rt_zcta, 'ZCTA5CE00', 'leisure_count')
zcta_leisure_count
| ZCTA5CE00 | leisure_count | |
|---|---|---|
| 0 | 02116 | 553 |
| 1 | 02138 | 384 |
| 2 | 02139 | 351 |
| 3 | 02128 | 315 |
| 4 | 02130 | 265 |
| ... | ... | ... |
| 57 | 02461 | 29 |
| 58 | 02222 | 21 |
| 59 | 02468 | 15 |
| 60 | 02464 | 13 |
| 61 | 02163 | 7 |
62 rows × 2 columns
leisure_gdfs = [zcta_leisure_count] leisure_cols = ['leisure_count'] zcta_leisure2 = calc_density(rt_zcta, leisure_gdfs, ['ZCTA5CE00'], 'left', leisure_cols, 'leisure_count', 'leisure_density')
# Merge statistics with spatial ZCTA rental GDF.
zcta_leisure = rt_zcta.merge(zcta_leisure_count, on='ZCTA5CE00', how='left')
zcta_leisure.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 62 entries, 0 to 61 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 62 non-null int64 1 STATEFP00 62 non-null object 2 ZCTA5CE00 62 non-null object 3 GEOID00 62 non-null object 4 CLASSFP00 62 non-null object 5 MTFCC00 62 non-null object 6 FUNCSTAT00 62 non-null object 7 ALAND00 62 non-null int64 8 AWATER00 62 non-null int64 9 INTPTLAT00 62 non-null object 10 INTPTLON00 62 non-null object 11 PARTFLG00 62 non-null object 12 geometry 62 non-null geometry 13 leisure_count 62 non-null int64 dtypes: geometry(1), int64(4), object(9) memory usage: 7.3+ KB
zcta_leisure2.sort_values(by='leisure_density', ascending=False).head()
# Calculate node density in leisure/sqkm.
zcta_leisure['leisure_density'] = zcta_leisure['leisure_count']/zcta_leisure.area*(10**6)
zcta_leisure.sort_values(by='leisure_density', ascending=False).head()
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | leisure_count | leisure_density | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 42 | 308 | 25 | 02199 | 2502199 | B5 | G6350 | S | 149678 | 0 | +42.3474648 | -071.0820577 | N | POLYGON ((234168.090 899801.270, 234292.861 89... | 90 | 601.331570 |
| 53 | 435 | 25 | 02116 | 2502116 | B5 | G6350 | S | 1712064 | 222385 | +42.3499907 | -071.0760802 | N | POLYGON ((234702.966 899809.334, 234675.900 89... | 553 | 285.887150 |
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 21 | 269.677305 |
| 56 | 438 | 25 | 02108 | 2502108 | B5 | G6350 | S | 721093 | 0 | +42.3575539 | -071.0639133 | N | POLYGON ((235436.185 900908.999, 235400.476 90... | 137 | 190.002785 |
| 33 | 253 | 25 | 02109 | 2502109 | B5 | G6350 | S | 427819 | 0 | +42.3626531 | -071.0538044 | N | MULTIPOLYGON (((236437.338 900792.044, 236429.... | 73 | 173.260969 |
# View statistics for leisure_density.
zcta_leisure.leisure_density.describe()
count 62.000000 mean 53.904875 std 93.579660 min 3.467852 25% 8.813364 50% 20.305715 75% 53.653825 max 601.331570 Name: leisure_density, dtype: float64
# View histogram for distribution of leisure_density.
sns.histplot(data=zcta_leisure, x='leisure_density')
plt.title('Distribution of Leisure Density per ZCTA in Boston Area')
plt.show()
The outlier for leisure density is ZCTA 02199, the Prudential Center.
# Map the leisure density in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_leisure.plot(column='leisure_density',
legend=True,
edgecolor='black',
cmap='YlGnBu',
legend_kwds={'label': "Leisure locations per sqkm"},
ax=ax)
tufts_bu.plot(ax=ax, color='maroon', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='maroon', alpha=0.5, label='Shortest Paths')
plt.title('Leisure Density by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# Calculate values to separate leisure into five quantiles.
# Use same quintiles list of thresholds from reclassifying transit.
leisure_quints = quantiles(zcta_leisure, 'leisure_density', quintiles)
leisure_quints
[7.501765917211352, 13.238439468670206, 26.617674931638195, 57.78530097172477]
# Reclassify leisure density with reclass_5 function and quintile values.
zcta_leisure['leisure_reclass'] = zcta_leisure['leisure_density'].apply(lambda x: reclass_5(x, leisure_quints, 'high'))
# View top and bottom five median rental ZCTAs.
zcta_leisure.sort_values(by='leisure_reclass', ascending=False)
| index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | INTPTLON00 | PARTFLG00 | geometry | leisure_count | leisure_density | leisure_reclass | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61 | 495 | 25 | 02222 | 2502222 | B5 | G6350 | S | 77896 | 0 | +42.3664695 | -071.0624939 | N | POLYGON ((235839.790 902134.685, 235852.488 90... | 21 | 269.677305 | 5 |
| 57 | 482 | 25 | 02139 | 2502139 | B5 | G6350 | S | 3784723 | 368360 | +42.3622881 | -071.1037905 | N | POLYGON ((232130.866 902283.249, 232144.254 90... | 351 | 84.521054 | 5 |
| 36 | 288 | 25 | 02140 | 2502140 | B5 | G6350 | S | 2905552 | 51943 | +42.3921567 | -071.1339958 | N | POLYGON ((229516.318 904689.286, 229459.537 90... | 171 | 57.822486 | 5 |
| 40 | 306 | 25 | 02215 | 2502215 | B5 | G6350 | S | 1984378 | 279591 | +42.3470532 | -071.1019853 | N | POLYGON ((230882.811 900191.721, 230873.568 90... | 201 | 88.787369 | 5 |
| 33 | 253 | 25 | 02109 | 2502109 | B5 | G6350 | S | 427819 | 0 | +42.3626531 | -071.0538044 | N | MULTIPOLYGON (((236437.338 900792.044, 236429.... | 73 | 173.260969 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 43 | 365 | 25 | 02466 | 2502466 | B5 | G6350 | S | 4650253 | 263098 | +42.3444566 | -071.2486166 | N | POLYGON ((219787.259 899807.604, 219812.724 89... | 34 | 6.920364 | 1 |
| 11 | 58 | 25 | 02126 | 2502126 | B5 | G6350 | S | 5260166 | 67030 | +42.2748559 | -071.0929083 | N | MULTIPOLYGON (((232299.828 890594.954, 232285.... | 37 | 6.992167 | 1 |
| 7 | 54 | 25 | 02132 | 2502132 | B5 | G6350 | S | 11715729 | 249516 | +42.2809050 | -071.1639646 | N | POLYGON ((228857.652 890423.158, 228862.932 89... | 70 | 5.850685 | 1 |
| 30 | 250 | 25 | 02131 | 2502131 | B5 | G6350 | S | 7924684 | 28885 | +42.2830207 | -071.1250273 | N | POLYGON ((229122.720 892601.111, 229102.305 89... | 37 | 4.652311 | 1 |
| 38 | 298 | 25 | 02171 | 2502171 | B5 | G6350 | S | 6439951 | 110355 | +42.2861505 | -071.0259433 | N | MULTIPOLYGON (((239149.911 894448.989, 239180.... | 37 | 5.674231 | 1 |
62 rows × 16 columns
# Map the median rent price in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
zcta_leisure.plot(column='leisure_reclass',
legend=True,
edgecolor='black',
cmap='plasma_r',
legend_kwds={'label': "Reclassified leisure density"},
ax=ax)
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
plt.title('Reclassified Leisure Density by ZCTA in Boston Area', fontsize=16)
plt.legend()
plt.show()
# View ZCTAs with the highest leisure density (over 48 establishments/sqkm).
zcta_leisure.loc[zcta_leisure.leisure_reclass==5]['ZCTA5CE00']
17 02115 33 02109 36 02140 40 02215 42 02199 48 02110 50 02114 53 02116 55 02113 56 02108 57 02139 59 02111 61 02222 Name: ZCTA5CE00, dtype: object
Needless to say, the trend of the most amenities in the central area of Boston holds for leisure establsihments as well.
Analyzing the reclassification maps of each indicator (mass transit, rent affordability, food establishment, health services, and public services), there is significant variation in desirable ZCTAs for each indicator. A weighted index was created to compare overall suitability across all indicators.
# Add analysis and reclass columns to base rt_zcta using function multimerge and view info to confirm success.
indicators_list = [zcta_nodes, zcta_rent, zcta_food, zcta_health, zcta_public_service, zcta_leisure]
on_cols = ['index', 'STATEFP00', 'ZCTA5CE00', 'GEOID00', 'CLASSFP00', 'MTFCC00', 'FUNCSTAT00', 'ALAND00', 'AWATER00', 'INTPTLAT00', 'INTPTLON00', 'PARTFLG00', 'geometry']
zcta_index = multimerge(rt_zcta, indicators_list, on_cols, 'left')
zcta_index.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 62 entries, 0 to 61 Data columns (total 48 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 62 non-null int64 1 STATEFP00 62 non-null object 2 ZCTA5CE00 62 non-null object 3 GEOID00 62 non-null object 4 CLASSFP00 62 non-null object 5 MTFCC00 62 non-null object 6 FUNCSTAT00 62 non-null object 7 ALAND00 62 non-null int64 8 AWATER00 62 non-null int64 9 INTPTLAT00 62 non-null object 10 INTPTLON00 62 non-null object 11 PARTFLG00 62 non-null object 12 geometry 62 non-null geometry 13 bus_stop_count 62 non-null float64 14 rt_stop_count 62 non-null float64 15 train_stop_count 62 non-null float64 16 nodes_count 62 non-null float64 17 nodes_density 62 non-null float64 18 nodes_reclass 62 non-null int64 19 rent_len 53 non-null float64 20 rent_min 53 non-null float64 21 rent_max 53 non-null float64 22 rent_median 53 non-null float64 23 rent_mean 53 non-null float64 24 rent_std 53 non-null float64 25 rent_median_reclass 53 non-null float64 26 grocery_count 62 non-null float64 27 prep_food_count 62 non-null float64 28 farmer_mrkt_count 62 non-null float64 29 food_count 62 non-null float64 30 food_density 62 non-null float64 31 food_reclass 62 non-null int64 32 comm_health_count 62 non-null float64 33 hospitals_count 62 non-null float64 34 healthcare_count 62 non-null float64 35 health_count 62 non-null float64 36 health_density 62 non-null float64 37 health_reclass 62 non-null int64 38 fire_count 62 non-null float64 39 police_count 62 non-null float64 40 usps_count 62 non-null float64 41 library_count 62 non-null float64 42 public_service_count 62 non-null float64 43 public_service_density 62 non-null float64 44 public_service_reclass 62 non-null int64 45 leisure_count 62 non-null int64 46 leisure_density 62 non-null float64 47 leisure_reclass 62 non-null int64 dtypes: float64(29), geometry(1), int64(9), object(9) memory usage: 23.7+ KB
A weighted index is highly subjective. Some may value availability of food over cost of rent, while others may value convenience of public transit over all else. Others still may value leisure above all else.
For the purposes of this study (ostensibly concerned with living as a student without a car), rent affordability was given the most weight at 35%, while transit was given 20% and food was given 15% (as transit can be used to go to more food-dense areas). Though health services and public services are categorized as necessities, leisure is sought after more often than either health or public services and was thus given 15%. Health and public services were therefore given 7.5% each.
Because rental data was only available in a segment of ZCTAs, NaN values were preserved across index calculations to ensure a fair comparison.
# Calculate weighted index and view info.
zcta_index['weighted'] = (zcta_index.nodes_reclass*0.2 +
zcta_index.rent_median_reclass*0.35 +
zcta_index.food_reclass*0.15 +
zcta_index.health_reclass*0.075 +
zcta_index.public_service_reclass*0.075 +
zcta_index.leisure_reclass*0.15)
zcta_index.info()
<class 'geopandas.geodataframe.GeoDataFrame'> Int64Index: 62 entries, 0 to 61 Data columns (total 49 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 62 non-null int64 1 STATEFP00 62 non-null object 2 ZCTA5CE00 62 non-null object 3 GEOID00 62 non-null object 4 CLASSFP00 62 non-null object 5 MTFCC00 62 non-null object 6 FUNCSTAT00 62 non-null object 7 ALAND00 62 non-null int64 8 AWATER00 62 non-null int64 9 INTPTLAT00 62 non-null object 10 INTPTLON00 62 non-null object 11 PARTFLG00 62 non-null object 12 geometry 62 non-null geometry 13 bus_stop_count 62 non-null float64 14 rt_stop_count 62 non-null float64 15 train_stop_count 62 non-null float64 16 nodes_count 62 non-null float64 17 nodes_density 62 non-null float64 18 nodes_reclass 62 non-null int64 19 rent_len 53 non-null float64 20 rent_min 53 non-null float64 21 rent_max 53 non-null float64 22 rent_median 53 non-null float64 23 rent_mean 53 non-null float64 24 rent_std 53 non-null float64 25 rent_median_reclass 53 non-null float64 26 grocery_count 62 non-null float64 27 prep_food_count 62 non-null float64 28 farmer_mrkt_count 62 non-null float64 29 food_count 62 non-null float64 30 food_density 62 non-null float64 31 food_reclass 62 non-null int64 32 comm_health_count 62 non-null float64 33 hospitals_count 62 non-null float64 34 healthcare_count 62 non-null float64 35 health_count 62 non-null float64 36 health_density 62 non-null float64 37 health_reclass 62 non-null int64 38 fire_count 62 non-null float64 39 police_count 62 non-null float64 40 usps_count 62 non-null float64 41 library_count 62 non-null float64 42 public_service_count 62 non-null float64 43 public_service_density 62 non-null float64 44 public_service_reclass 62 non-null int64 45 leisure_count 62 non-null int64 46 leisure_density 62 non-null float64 47 leisure_reclass 62 non-null int64 48 weighted 53 non-null float64 dtypes: float64(30), geometry(1), int64(9), object(9) memory usage: 24.2+ KB
# View stats on weighted index.
zcta_index.weighted.describe()
count 53.000000 mean 3.133491 std 0.552049 min 1.500000 25% 2.850000 50% 3.225000 75% 3.600000 max 3.950000 Name: weighted, dtype: float64
# Sort by weighted index and reset index
zcta_index = zcta_index.sort_values(by='weighted', ascending=False).reset_index()
# View five most suitable ZCTAs.
zcta_index.head()
| level_0 | index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | ... | police_count | usps_count | library_count | public_service_count | public_service_density | public_service_reclass | leisure_count | leisure_density | leisure_reclass | weighted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 40 | 306 | 25 | 02215 | 2502215 | B5 | G6350 | S | 1984378 | 279591 | ... | 0.0 | 4.0 | 0.0 | 4.0 | 1.766913 | 5 | 201 | 88.787369 | 5 | 3.950 |
| 1 | 17 | 115 | 25 | 02115 | 2502115 | B5 | G6350 | S | 1979415 | 162098 | ... | 0.0 | 1.0 | 1.0 | 4.0 | 1.867961 | 5 | 205 | 95.732991 | 5 | 3.950 |
| 2 | 13 | 96 | 25 | 02145 | 2502145 | B5 | G6350 | S | 3628803 | 235197 | ... | 0.0 | 1.0 | 1.0 | 3.0 | 0.784470 | 3 | 164 | 42.884379 | 4 | 3.925 |
| 3 | 2 | 36 | 25 | 02143 | 2502143 | B5 | G6350 | S | 4007477 | 0 | ... | 1.0 | 2.0 | 1.0 | 7.0 | 1.746847 | 4 | 216 | 53.902721 | 4 | 3.925 |
| 4 | 39 | 305 | 25 | 02135 | 2502135 | B5 | G6350 | S | 7222258 | 469151 | ... | 2.0 | 1.0 | 2.0 | 7.0 | 0.910165 | 3 | 126 | 16.382977 | 3 | 3.775 |
5 rows × 50 columns
# View five least suitable ZCTAs, taking into account null values.
zcta_index.loc[:52].tail()
| level_0 | index | STATEFP00 | ZCTA5CE00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | ... | police_count | usps_count | library_count | public_service_count | public_service_density | public_service_reclass | leisure_count | leisure_density | leisure_reclass | weighted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 48 | 45 | 369 | 25 | 02142 | 2502142 | B5 | G6350 | S | 1053187 | 577829 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 0.613155 | 2 | 94 | 57.636559 | 4 | 2.200 |
| 49 | 7 | 54 | 25 | 02132 | 2502132 | B5 | G6350 | S | 11715729 | 249516 | ... | 1.0 | 1.0 | 1.0 | 5.0 | 0.417906 | 1 | 70 | 5.850685 | 1 | 2.175 |
| 50 | 16 | 107 | 25 | 02445 | 2502445 | B5 | G6350 | S | 13598475 | 147868 | ... | 1.0 | 1.0 | 2.0 | 8.0 | 0.582012 | 2 | 181 | 13.168013 | 2 | 2.150 |
| 51 | 52 | 429 | 25 | 02186 | 2502186 | B5 | G6350 | S | 36334750 | 436915 | ... | 2.0 | 3.0 | 1.0 | 9.0 | 0.245753 | 1 | 127 | 3.467852 | 1 | 1.700 |
| 52 | 46 | 370 | 25 | 02467 | 2502467 | B5 | G6350 | S | 5755959 | 108004 | ... | 0.0 | 2.0 | 0.0 | 2.0 | 0.341088 | 1 | 33 | 5.627960 | 1 | 1.500 |
5 rows × 50 columns
# Map the weighted index in each ZCTA.
ax = rt_zcta.plot(color='lightgrey', edgecolor='grey', figsize=(12,12))
tufts_bu.plot(ax=ax, color='aquamarine', markersize=50, zorder=10, label='Tufts and BU')
tufts_bu_paths.plot(ax=ax, color='aquamarine', alpha=0.5, label='Shortest Paths')
zcta_index.plot(column='weighted',
legend=True,
edgecolor='black',
cmap='plasma_r',
legend_kwds={'label': "Weighted suitability index"},
ax=ax)
plt.title('Weighted Suitability of Living in Boston Area ZCTAs', fontsize=16)
plt.legend()
plt.show()
Folium to View Interactively¶While the above choropleth map is insightful, an interactive map is more helpful to view actual locations with reference points beyond Tufts and BU. Interactive maps were created by converting all relevant GDFs to latitude-longitude and plotting them with Folium.
# Convert zcta_index GDF to latlong (denoted with ll).
# Set index to ZCTA names.
zcta_index_ll = zcta_index.to_crs('epsg:4326').set_index('ZCTA5CE00')
# View first five rows.
zcta_index_ll.head()
| level_0 | index | STATEFP00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | ... | police_count | usps_count | library_count | public_service_count | public_service_density | public_service_reclass | leisure_count | leisure_density | leisure_reclass | weighted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ZCTA5CE00 | |||||||||||||||||||||
| 02215 | 40 | 306 | 25 | 2502215 | B5 | G6350 | S | 1984378 | 279591 | +42.3470532 | ... | 0.0 | 4.0 | 0.0 | 4.0 | 1.766913 | 5 | 201 | 88.787369 | 5 | 3.950 |
| 02115 | 17 | 115 | 25 | 2502115 | B5 | G6350 | S | 1979415 | 162098 | +42.3375449 | ... | 0.0 | 1.0 | 1.0 | 4.0 | 1.867961 | 5 | 205 | 95.732991 | 5 | 3.950 |
| 02145 | 13 | 96 | 25 | 2502145 | B5 | G6350 | S | 3628803 | 235197 | +42.3915769 | ... | 0.0 | 1.0 | 1.0 | 3.0 | 0.784470 | 3 | 164 | 42.884379 | 4 | 3.925 |
| 02143 | 2 | 36 | 25 | 2502143 | B5 | G6350 | S | 4007477 | 0 | +42.3815721 | ... | 1.0 | 2.0 | 1.0 | 7.0 | 1.746847 | 4 | 216 | 53.902721 | 4 | 3.925 |
| 02135 | 39 | 305 | 25 | 2502135 | B5 | G6350 | S | 7222258 | 469151 | +42.3484167 | ... | 2.0 | 1.0 | 2.0 | 7.0 | 0.910165 | 3 | 126 | 16.382977 | 3 | 3.775 |
5 rows × 49 columns
# Create new column with zipcodes for labeling the map and confirm success.
zcta_index_ll['zipcode'] = zcta_index_ll.index
zcta_index_ll.head()
| level_0 | index | STATEFP00 | GEOID00 | CLASSFP00 | MTFCC00 | FUNCSTAT00 | ALAND00 | AWATER00 | INTPTLAT00 | ... | usps_count | library_count | public_service_count | public_service_density | public_service_reclass | leisure_count | leisure_density | leisure_reclass | weighted | zipcode | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ZCTA5CE00 | |||||||||||||||||||||
| 02215 | 40 | 306 | 25 | 2502215 | B5 | G6350 | S | 1984378 | 279591 | +42.3470532 | ... | 4.0 | 0.0 | 4.0 | 1.766913 | 5 | 201 | 88.787369 | 5 | 3.950 | 02215 |
| 02115 | 17 | 115 | 25 | 2502115 | B5 | G6350 | S | 1979415 | 162098 | +42.3375449 | ... | 1.0 | 1.0 | 4.0 | 1.867961 | 5 | 205 | 95.732991 | 5 | 3.950 | 02115 |
| 02145 | 13 | 96 | 25 | 2502145 | B5 | G6350 | S | 3628803 | 235197 | +42.3915769 | ... | 1.0 | 1.0 | 3.0 | 0.784470 | 3 | 164 | 42.884379 | 4 | 3.925 | 02145 |
| 02143 | 2 | 36 | 25 | 2502143 | B5 | G6350 | S | 4007477 | 0 | +42.3815721 | ... | 2.0 | 1.0 | 7.0 | 1.746847 | 4 | 216 | 53.902721 | 4 | 3.925 | 02143 |
| 02135 | 39 | 305 | 25 | 2502135 | B5 | G6350 | S | 7222258 | 469151 | +42.3484167 | ... | 1.0 | 2.0 | 7.0 | 0.910165 | 3 | 126 | 16.382977 | 3 | 3.775 | 02135 |
5 rows × 50 columns
# Create a Series of the weighted index and view.
weighted = zcta_index_ll.weighted
weighted
ZCTA5CE00
02215 3.950
02115 3.950
02145 3.925
02143 3.925
02135 3.775
...
02460 NaN
02461 NaN
02459 NaN
02466 NaN
02465 NaN
Name: weighted, Length: 62, dtype: float64
# Check type to make sure it's a Series.
type(weighted)
pandas.core.series.Series
# Convert Tufts and BU GDFs.
tufts_bu_paths_ll = tufts_bu_paths.to_crs('epsg:4326')
tufts_bu_ll = tufts_bu.to_crs('epsg:4326')
# Create lat-long columns from point geometry of Tufts and BU locations.
tufts_bu_ll['latitude'] = tufts_bu_ll.geometry.y
tufts_bu_ll['longitude'] = tufts_bu_ll.geometry.x
tufts_bu_ll.head()
| COLLEGE | CAMPUS | ADDRESS | CITY | ZIPCODE | PLUS_FOUR | GEOG_TOWN | MAIN_TEL | URL | NCES_ID | ... | CATEGORY | DEGREEOFFR | AWARDSOFFR | LARGEPROG | CAMPUSSETT | CAMPUSHOUS | L_SRC | geometry | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 136 | Tufts University | Medford/Somerville Campus | 419 Boston Avenue | Medford | 02155 | None | MEDFORD | (617) 628-5000 | http://www.tufts.edu | 168148 | ... | Research University | C, B, PBC, M, PMC, D | Two but less than 4 years certificate;Bachelor... | None | Suburb: Large | Yes | nces.ed.gov | POINT (-71.11834 42.40851) | 42.408509 | -71.118344 |
| 163 | Boston University | Charles River Campus | 1 Silber Way | Boston | 02215 | None | BOSTON | (617) 353-2000 | http://www.bu.edu | 164988 | ... | Research University | C, B, PBC, M, PMC, D | Less than one year certificate;One but less th... | None | City: Large | Yes | nces.ed.gov | POINT (-71.09968 42.34960) | 42.349596 | -71.099677 |
2 rows × 22 columns
# Map weighted suitability with schools and routes.
m_1 = folium.Map(location=[42.38, -71.10], tiles='openstreetmap', zoom_start=12)
# Add choropleth with weighted suitability.
suitability = folium.Choropleth(geo_data=zcta_index_ll.__geo_interface__,
data=weighted,
key_on='feature.id',
fill_color='BuPu',
fill_opacity=0.5,
highlight=True,
legend_name='Weighted Suitability').add_to(m_1)
# Add labels to ZCTAs.
suitability.geojson.add_child(
folium.features.GeoJsonTooltip(['zipcode', 'weighted'], labels=True))
# Add Tufts and BU to the map.
for idx, row in tufts_bu_ll.iterrows():
folium.Marker([row['latitude'], row['longitude']],
icon=folium.Icon(color='orange', icon='book')).add_to(m_1)
# Add Tufts and BU paths to the map.
folium.Choropleth(tufts_bu_paths_ll.geometry, line_weight=3.5, line_color='orange').add_to(m_1)
m_1
# Convert route GDFs to latlong.
bos_bus_route_ll = bos_bus_route.to_crs('epsg:4326')
bos_rt_route_ll = bos_rt_route.to_crs('epsg:4326')
bos_train_route_ll = bos_train_route.to_crs('epsg:4326')
# Map routes on basemap with muted colors.
m_2 = folium.Map(location=[42.38, -71.10], tiles='cartodbpositron', zoom_start=12)
# Add Tufts and BU to the map.
for idx, row in tufts_bu_ll.iterrows():
folium.Marker([row['latitude'], row['longitude']],
icon=folium.Icon(color='orange', icon='book')).add_to(m_2)
# Add Tufts and BU paths to the map.
folium.Choropleth(tufts_bu_paths_ll.geometry, line_weight=5, line_color='orange').add_to(m_2)
# Map bus routes
bus = folium.Choropleth(bos_bus_route_ll.geometry,
line_weight=1,
line_color='red',
highlight=True).add_to(m_2)
# Map T routes
mbta_t = folium.Choropleth(bos_rt_route_ll.geometry,
line_weight=3,
line_color='green',
highlight=True).add_to(m_2)
# Map commuter rail routes
train = folium.Choropleth(bos_train_route_ll.geometry,
line_weight=3,
line_color='purple',
highlight=True).add_to(m_2)
m_2